C. Intel and . Plus, http://software.intel.com/en-us/intel-cilk-plus, pp.19-20, 2013.

E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou et al., Plasma users' guide, Tech. Rep, 2009.

M. Aldinucci, M. Danelutto, and P. Dazzi, Muskel: an expandable skeleton environment, Scalable Computing: Practice and Experience, 2001.

M. Aldinucci, M. Danelutto, and J. Dünnweber, Optimization techniques for implementing parallel skeletons in grid environments, Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, pp.35-47, 2004.

M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, Fastflow: high-level and efficient streaming on multi-core, Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13, 2014.

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967.
DOI : 10.1145/1465482.1465560

P. An, A. Jula, S. Rus, S. Saunders, T. Smith et al., STAPL: An Adaptive, Generic Parallel C++ Library, Languages and Compilers for Parallel Computing, pp.193-208, 2003.
DOI : 10.1007/3-540-35767-X_13

K. Armih, G. Michaelson, and P. Trinder, Cache size in a cost model for heterogeneous skeletons, Proceedings of the fifth international workshop on High-level parallel programming and applications, HLPP '11, pp.3-10, 2011.
DOI : 10.1145/2034751.2034755

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous Bibliographie multicore architectures. Concurrency and Computation: Practice and Experience, pp.187-198, 2011.

E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin et al., The design of openmp tasks. Parallel and Distributed Systems, IEEE Transactions on, vol.20, issue.22, pp.404-418, 2009.

B. Bacci, M. Danelutto, S. Orlando, S. Pelagatti, and M. Vanneschi, P3l: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience, pp.225-255, 1995.

S. Balay, W. D. Gropp, L. C. Mcinnes, and B. F. Smith, Efficienct management of parallelism in object oriented numerical software libraries, Modern Software Tools in Scientific Computing, pp.163-202, 1997.

R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato et al., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, 1994.
DOI : 10.1137/1.9781611971538

A. Benoit, M. Cole, S. Gilmore, and J. Hillston, Evaluating the Performance of Skeleton-Based High Level Parallel Programs, Computational Science -ICCS 2004, pp.289-296, 2004.
DOI : 10.1007/978-3-540-24688-6_40

URL : https://hal.archives-ouvertes.fr/hal-00807288

A. Benoit, M. Cole, S. Gilmore, and J. Hillston, Flexible Skeletal Programming with eSkel, Proceedings of the 11th International Euro- Par Conference on Parallel Processing, Euro-Par'05, p.761, 2005.
DOI : 10.1007/11549468_83

URL : https://hal.archives-ouvertes.fr/hal-00807021

A. J. Bernstein, Analysis of programs for parallel processing Electronic Computers, IEEE Transactions, issue.5, pp.15757-763, 1966.

F. Black and M. Scholes, The pricing of options and corporate liabilities . The journal of political economy, pp.637-654, 1973.

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999.

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, and J. Dongarra, From Serial Loops to Parallel Execution on Distributed Systems, Euro-Par 2012 Parallel Processing, pp.246-257
DOI : 10.1007/978-3-642-32820-6_25

M. Bouzidi, D. Humières, P. Lallemand, and L. Luo, Lattice Boltzmann Equation on a Two-Dimensional Rectangular Grid, Journal of Computational Physics, vol.172, issue.2, pp.704-717, 2001.
DOI : 10.1006/jcph.2001.6850

J. Bull, Measuring synchronisation and scheduling overheads in openmp, Proceedings of First European Workshop on OpenMP, p.49, 1999.

J. M. Bull, F. Reid, and N. Mcdonnell, A Microbenchmark Suite for OpenMP Tasks, OpenMP in a Heterogeneous World, pp.271-274
DOI : 10.1007/978-3-642-30961-8_24

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

L. Bradford, D. Chamberlain, . Callahan, P. Hans, and . Zima, Parallel programmability and the chapel language, International Journal of High Performance Computing Applications, vol.21, issue.3, pp.291-312, 2007.

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra et al., X10, ACM SIGPLAN Notices, vol.40, issue.10, pp.519-538, 2005.
DOI : 10.1145/1103845.1094852

URL : https://hal.archives-ouvertes.fr/in2p3-00166974

W. Ching and D. Zheng, Automatic Parallelization of Array-oriented Programs for a Multi-core Machine, International Journal of Parallel Programming, vol.27, issue.1, pp.514-531
DOI : 10.1007/s10766-012-0197-6

P. Chretienne, . Lenstra, and . Liu, Scheduling Theory and its Applications, Journal of the Operational Research Society, vol.48, issue.7, pp.764-765, 1997.
DOI : 10.1057/palgrave.jors.2600829

P. Ciechanowicz and H. Kuchen, Enhancing Muesli's Data Parallel Skeletons for Multi-core Computer Architectures, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), pp.108-113, 2010.
DOI : 10.1109/HPCC.2010.23

M. Cole, Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004.
DOI : 10.1016/j.parco.2003.12.002

I. Murray and . Cole, Algorithmic skeletons: structured management of parallel computation, 1989.

D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser et al., LogP: Towards a realistic model of parallel computation, 1993.

J. Darlington, J. Anthony, . Field, G. Peter, . Harrison et al., Parallel programming using skeleton functions, PARLE'93 Parallel Architectures and Languages Europe, pp.146-160, 1993.
DOI : 10.1007/3-540-56891-3_12

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

U. Dastgeer, J. Enmyren, and C. W. Kessler, Auto-tuning SkePU, Proceeding of the 4th international workshop on Multicore software engineering, IWMSE '11, pp.25-32, 2011.
DOI : 10.1145/1984693.1984697

J. J. Dongarra, P. Luszczek, and A. Petitet, LINPACK Benchmark, pp.803-820, 2003.
DOI : 10.1007/978-0-387-09766-4_155

K. Emoto, K. Matsuzaki, Z. Hu, and M. Takeichi, Domain-Specific Optimization Strategy for Skeleton Programs, Euro-Par 2007 Parallel Processing, pp.705-714
DOI : 10.1007/978-3-540-74466-5_74

P. Esterie, J. Falcou, M. Gaunard, J. Lapresté, and L. Lacassagne, The numerical template toolbox: A modern C++ design for scientific computing, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.3240-3253
DOI : 10.1016/j.jpdc.2014.07.002

URL : https://hal.archives-ouvertes.fr/hal-01061305

P. Estérie, M. Gaunard, J. Falcou, J. Lapresté, and B. Rozoy, Boost. simd: generic programming for portable simdization, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pp.431-432

J. Falcou, M. Gaunard, and J. Lapresté, The numerical template toolbox, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01061305

J. Falcou, J. Sérot, L. Pech, and J. Lapresté, Metaprogramming applied to automatic smp parallelization of linear algebra code, Euro-Par 2008?Parallel Processing, pp.729-738, 2008.

S. Fortune and J. Wyllie, Parallelism in random access machines, Proceedings of the tenth annual ACM symposium on Theory of computing , STOC '78, pp.114-118, 1978.
DOI : 10.1145/800133.804339

C. Grelck and S. Scholz, SAC???A Functional Array Language for Efficient Multi-threaded Execution, International Journal of Parallel Programming, vol.5, issue.4, pp.383-427, 2006.
DOI : 10.1007/s10766-006-0018-x

D. Gross, Fundamentals of queueing theory, 2008.
DOI : 10.1002/9781118625651

J. Neil and . Gunther, A general theory of computational scalability based on rational functions. arXiv preprint, 2008.

J. L. Gustafson, Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988.
DOI : 10.1145/42411.42415

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Mark, M. R. Hill, and . Marty, Amdahl's law in the multicore era, Computer, issue.7, pp.4133-4171, 2008.

P. Hudak, Building domain-specific embedded languages, ACM Computing Surveys, vol.28, issue.4es, p.196, 1996.
DOI : 10.1145/242224.242477

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. Javed and F. Loulergue, OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays
DOI : 10.1145/79173.79181

URL : https://hal.archives-ouvertes.fr/inria-00452523

. Joller, Advanced Parallel Processing Technologies, Lecture Notes in Computer Science, vol.5737, pp.436-451, 2009.

R. Simon-peyton-jones, G. Leshchinskiy, M. Keller, and . Chakravarty, Harnessing the Multicores: Nested Data Parallelism in Haskell, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, volume 2 of Leibniz International Proceedings in Informatics (LIPIcs), pp.383-414, 2008.
DOI : 10.1007/978-3-540-89330-1_10

H. Kaiser, M. Brodowicz, and T. Sterling, ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401
DOI : 10.1109/ICPPW.2009.14

K. Kerr, Windows with C++ -Visual C++ 2010 and the Parallel Patterns Library. MSDN magazine, 2009.

H. Kuchen, A Skeleton Library, 2002.
DOI : 10.1007/3-540-45706-2_86

C. Lauderdale and R. Khan, Towards a codelet-based runtime for exascale computing, Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '12, pp.21-26
DOI : 10.1145/2185475.2185478

J. D. Mccalpin, Stream: Sustainable memory bandwidth in high performance computers A continually updated technical report, 1991.

M. John, M. L. Mellor-crummey, and . Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Trans. Comput. Syst, vol.9, issue.1, pp.21-65, 1991.

M. Maged, M. L. Michael, and . Scott, Simple, fast, and practical nonblocking and blocking concurrent queue algorithms, Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC '96, pp.267-275, 1996.

R. Garey, M. , and S. David, Computers and intractability: a guide to the theory of np-completeness, 1979.

E. Niebler, Proto, Proceedings of the 2007 Symposium on Library-Centric Software Design, LCSD '07, 2007.
DOI : 10.1145/1512762.1512767

W. Robert, J. Numrich, and . Reid, Co-array fortran for parallel programming, In ACM Sigplan Fortran Forum, vol.17, pp.1-31, 1998.

J. Reinders, Intel Threading Building Blocks: outfitting C++ for multicore processor parallelism, pp.20-21, 2010.

Y. Saad and M. H. Schultz, GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.3, pp.856-869, 1986.
DOI : 10.1137/0907058

A. Behrooz, . Shirazi, M. Krishna, A. R. Kavi, and . Hurson, Scheduling and load balancing in parallel and distributed systems, 1995.

D. Spinellis, Notable design patterns for domain-specific languages, Journal of Systems and Software, vol.56, issue.1, pp.91-99, 2001.
DOI : 10.1016/S0164-1212(00)00089-3

. Xian-he, Y. Sun, and . Chen, Reevaluating amdahl's law in the multicore era, Journal of Parallel and Distributed Computing, vol.70, issue.2, pp.183-188, 2010.

. Xian-he, . Sun, M. Lionel, and . Ni, Scalable problems and memory-bounded speedup, Journal of Parallel and Distributed Computing, vol.19, issue.1, pp.27-37, 1993.

L. Tratt, Model transformations and tool integration. Software & Systems Modeling, pp.112-122, 2005.

G. Leslie and . Valiant, A bridging model for parallel computation, Communications of the ACM, vol.33, issue.8, pp.103-111, 1990.

G. Leslie and . Valiant, A bridging model for multi-core computing, J. Comput. Syst. Sci, vol.77, issue.1, pp.154-166, 2011.

T. Veldhuizen, Expression templates, C++ Report, vol.7, pp.26-31, 1995.

K. B. Wheeler, R. C. Murphy, and D. Thain, Qthreads: An API for programming with millions of lightweight threads, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.
DOI : 10.1109/IPDPS.2008.4536359

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.
DOI : 10.1145/1498765.1498785

C. Xu and F. C. Lau, Load Balancing in Parallel Computers: Theory and Practice, 1997.