http://software.intel.com/en-us/intel-cilk-plus, pp.19-20, 2013. ,
Plasma users' guide, Tech. Rep, 2009. ,
Muskel: an expandable skeleton environment, Scalable Computing: Practice and Experience, 2001. ,
Optimization techniques for implementing parallel skeletons in grid environments, Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, pp.35-47, 2004. ,
Fastflow: high-level and efficient streaming on multi-core, Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13, 2014. ,
Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967. ,
DOI : 10.1145/1465482.1465560
STAPL: An Adaptive, Generic Parallel C++ Library, Languages and Compilers for Parallel Computing, pp.193-208, 2003. ,
DOI : 10.1007/3-540-35767-X_13
Cache size in a cost model for heterogeneous skeletons, Proceedings of the fifth international workshop on High-level parallel programming and applications, HLPP '11, pp.3-10, 2011. ,
DOI : 10.1145/2034751.2034755
StarPU: a unified platform for task scheduling on heterogeneous Bibliographie multicore architectures. Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
The design of openmp tasks. Parallel and Distributed Systems, IEEE Transactions on, vol.20, issue.22, pp.404-418, 2009. ,
P3l: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience, pp.225-255, 1995. ,
Efficienct management of parallelism in object oriented numerical software libraries, Modern Software Tools in Scientific Computing, pp.163-202, 1997. ,
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, 1994. ,
DOI : 10.1137/1.9781611971538
Evaluating the Performance of Skeleton-Based High Level Parallel Programs, Computational Science -ICCS 2004, pp.289-296, 2004. ,
DOI : 10.1007/978-3-540-24688-6_40
URL : https://hal.archives-ouvertes.fr/hal-00807288
Flexible Skeletal Programming with eSkel, Proceedings of the 11th International Euro- Par Conference on Parallel Processing, Euro-Par'05, p.761, 2005. ,
DOI : 10.1007/11549468_83
URL : https://hal.archives-ouvertes.fr/hal-00807021
Analysis of programs for parallel processing Electronic Computers, IEEE Transactions, issue.5, pp.15757-763, 1966. ,
The pricing of options and corporate liabilities . The journal of political economy, pp.637-654, 1973. ,
Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999. ,
From Serial Loops to Parallel Execution on Distributed Systems, Euro-Par 2012 Parallel Processing, pp.246-257 ,
DOI : 10.1007/978-3-642-32820-6_25
Lattice Boltzmann Equation on a Two-Dimensional Rectangular Grid, Journal of Computational Physics, vol.172, issue.2, pp.704-717, 2001. ,
DOI : 10.1006/jcph.2001.6850
Measuring synchronisation and scheduling overheads in openmp, Proceedings of First European Workshop on OpenMP, p.49, 1999. ,
A Microbenchmark Suite for OpenMP Tasks, OpenMP in a Heterogeneous World, pp.271-274 ,
DOI : 10.1007/978-3-642-30961-8_24
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Parallel programmability and the chapel language, International Journal of High Performance Computing Applications, vol.21, issue.3, pp.291-312, 2007. ,
X10, ACM SIGPLAN Notices, vol.40, issue.10, pp.519-538, 2005. ,
DOI : 10.1145/1103845.1094852
URL : https://hal.archives-ouvertes.fr/in2p3-00166974
Automatic Parallelization of Array-oriented Programs for a Multi-core Machine, International Journal of Parallel Programming, vol.27, issue.1, pp.514-531 ,
DOI : 10.1007/s10766-012-0197-6
Scheduling Theory and its Applications, Journal of the Operational Research Society, vol.48, issue.7, pp.764-765, 1997. ,
DOI : 10.1057/palgrave.jors.2600829
Enhancing Muesli's Data Parallel Skeletons for Multi-core Computer Architectures, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), pp.108-113, 2010. ,
DOI : 10.1109/HPCC.2010.23
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004. ,
DOI : 10.1016/j.parco.2003.12.002
Algorithmic skeletons: structured management of parallel computation, 1989. ,
LogP: Towards a realistic model of parallel computation, 1993. ,
Parallel programming using skeleton functions, PARLE'93 Parallel Architectures and Languages Europe, pp.146-160, 1993. ,
DOI : 10.1007/3-540-56891-3_12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.6680
Auto-tuning SkePU, Proceeding of the 4th international workshop on Multicore software engineering, IWMSE '11, pp.25-32, 2011. ,
DOI : 10.1145/1984693.1984697
LINPACK Benchmark, pp.803-820, 2003. ,
DOI : 10.1007/978-0-387-09766-4_155
Domain-Specific Optimization Strategy for Skeleton Programs, Euro-Par 2007 Parallel Processing, pp.705-714 ,
DOI : 10.1007/978-3-540-74466-5_74
The numerical template toolbox: A modern C++ design for scientific computing, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.3240-3253 ,
DOI : 10.1016/j.jpdc.2014.07.002
URL : https://hal.archives-ouvertes.fr/hal-01061305
Boost. simd: generic programming for portable simdization, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pp.431-432 ,
The numerical template toolbox, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01061305
Metaprogramming applied to automatic smp parallelization of linear algebra code, Euro-Par 2008?Parallel Processing, pp.729-738, 2008. ,
Parallelism in random access machines, Proceedings of the tenth annual ACM symposium on Theory of computing , STOC '78, pp.114-118, 1978. ,
DOI : 10.1145/800133.804339
SAC???A Functional Array Language for Efficient Multi-threaded Execution, International Journal of Parallel Programming, vol.5, issue.4, pp.383-427, 2006. ,
DOI : 10.1007/s10766-006-0018-x
Fundamentals of queueing theory, 2008. ,
DOI : 10.1002/9781118625651
A general theory of computational scalability based on rational functions. arXiv preprint, 2008. ,
Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988. ,
DOI : 10.1145/42411.42415
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.509.6892
Amdahl's law in the multicore era, Computer, issue.7, pp.4133-4171, 2008. ,
Building domain-specific embedded languages, ACM Computing Surveys, vol.28, issue.4es, p.196, 1996. ,
DOI : 10.1145/242224.242477
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5006
OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays ,
DOI : 10.1145/79173.79181
URL : https://hal.archives-ouvertes.fr/inria-00452523
Advanced Parallel Processing Technologies, Lecture Notes in Computer Science, vol.5737, pp.436-451, 2009. ,
Harnessing the Multicores: Nested Data Parallelism in Haskell, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, volume 2 of Leibniz International Proceedings in Informatics (LIPIcs), pp.383-414, 2008. ,
DOI : 10.1007/978-3-540-89330-1_10
ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401 ,
DOI : 10.1109/ICPPW.2009.14
Windows with C++ -Visual C++ 2010 and the Parallel Patterns Library. MSDN magazine, 2009. ,
A Skeleton Library, 2002. ,
DOI : 10.1007/3-540-45706-2_86
Towards a codelet-based runtime for exascale computing, Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '12, pp.21-26 ,
DOI : 10.1145/2185475.2185478
Stream: Sustainable memory bandwidth in high performance computers A continually updated technical report, 1991. ,
Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Trans. Comput. Syst, vol.9, issue.1, pp.21-65, 1991. ,
Simple, fast, and practical nonblocking and blocking concurrent queue algorithms, Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC '96, pp.267-275, 1996. ,
Computers and intractability: a guide to the theory of np-completeness, 1979. ,
Proto, Proceedings of the 2007 Symposium on Library-Centric Software Design, LCSD '07, 2007. ,
DOI : 10.1145/1512762.1512767
Co-array fortran for parallel programming, In ACM Sigplan Fortran Forum, vol.17, pp.1-31, 1998. ,
Intel Threading Building Blocks: outfitting C++ for multicore processor parallelism, pp.20-21, 2010. ,
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.3, pp.856-869, 1986. ,
DOI : 10.1137/0907058
Scheduling and load balancing in parallel and distributed systems, 1995. ,
Notable design patterns for domain-specific languages, Journal of Systems and Software, vol.56, issue.1, pp.91-99, 2001. ,
DOI : 10.1016/S0164-1212(00)00089-3
Reevaluating amdahl's law in the multicore era, Journal of Parallel and Distributed Computing, vol.70, issue.2, pp.183-188, 2010. ,
Scalable problems and memory-bounded speedup, Journal of Parallel and Distributed Computing, vol.19, issue.1, pp.27-37, 1993. ,
Model transformations and tool integration. Software & Systems Modeling, pp.112-122, 2005. ,
A bridging model for parallel computation, Communications of the ACM, vol.33, issue.8, pp.103-111, 1990. ,
A bridging model for multi-core computing, J. Comput. Syst. Sci, vol.77, issue.1, pp.154-166, 2011. ,
Expression templates, C++ Report, vol.7, pp.26-31, 1995. ,
Qthreads: An API for programming with millions of lightweight threads, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008. ,
DOI : 10.1109/IPDPS.2008.4536359
Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
DOI : 10.1145/1498765.1498785
Load Balancing in Parallel Computers: Theory and Practice, 1997. ,