E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2010.
DOI : 10.1016/B978-0-12-385963-1.00034-4

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing -19th International Conference, pp.521-532, 2013.
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009.
DOI : 10.1088/1742-6596/180/1/012037

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, p.2014
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

P. R. Amestoy, I. S. Duff, and C. Puglisi, Multifrontal QR Factorization in a Multiprocessor Environment, Numerical Linear Algebra with Applications, vol.8, issue.89, pp.275-300, 1996.
DOI : 10.1002/(SICI)1099-1506(199607/08)3:4<275::AID-NLA83>3.0.CO;2-7

T. Openmp and A. , The openmp api specification for parallel programming, 2012.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

C. Augonnet, Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, 2011.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Proceedings of the 15th Euro-Par Conference, pp.863-874, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.66
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010.
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295

A. Buttari, Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, 2013.
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471

W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks et al., Introduction to UPC and Language Specification, 1999.

P. Carribault, M. Pérache, and H. Jourdren, Enabling low-overhead hybrid mpi/openmp parallelism with MPC. In Beyond Loop Level Parallelism in OpenMP: Accelerators , Tasking and More, Proceedings, pp.1-14, 2010.

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797

F. Gregory, S. Diamos, and . Yalamanchili, Harmony: an execution model and runtime for heterogeneous many core systems, HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing, pp.197-200, 2008.

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. Reiter-horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

M. Frigo and S. G. Johnson, The Design and Implementation of FFTW3, Proceedings of the IEEE, pp.216-231, 2005.
DOI : 10.1109/JPROC.2004.840301

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998.
DOI : 10.1145/277652.277725

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

T. Gautier, V. F. João, N. Lima, B. Maillard, and . Raffin, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1299-1308, 2013.
DOI : 10.1109/IPDPS.2013.66
URL : https://hal.archives-ouvertes.fr/hal-00799904

L. Genovese, B. Videau, M. Ospici, T. Deutsch, S. Goedecker et al., Daubechies wavelets for high performance electronic structure calculations: The BigDFT project, Comptes Rendus M??canique, vol.339, issue.2-3, pp.339-341, 2011.
DOI : 10.1016/j.crme.2010.12.003

W. Gropp, MPICH2: A new start for MPI implementations In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, 2002.

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010.
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448

J. K. Hollingsworth, Storage and Analysis, SC '12, SC Conference on High Performance Computing Networking, 2012.

C. Iancu, S. A. Hofmeyr, F. Blagojevic, and Y. Zheng, Oversubscription on multicore processors, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-11, 2010.
DOI : 10.1109/IPDPS.2010.5470434
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.168.4615

F. D. Igual, E. Chan, E. S. Quintana-ortí, G. Quintana-ortí, R. A. Van-de-geijn et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kalé, and T. R. Quinn, Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11
DOI : 10.1109/SC.2010.49
URL : http://charm.cs.illinois.edu/newPapers/10-16/paper.pdf

J. Kurzak and J. Dongarra, Fully dynamic scheduler for numerical computing on multicore processors. LAPACK working note, lawn220, 2009.

R. Lepere, G. Mounie, and D. Trystram, An approximation algorithm for scheduling trees of malleable tasks, European Journal of Operational Research, vol.142, issue.2, pp.242-249, 2002.
DOI : 10.1016/S0377-2217(02)00264-3
URL : https://hal.archives-ouvertes.fr/hal-00001546

A. Marochko, Tbb 3.0 task scheduler improves composability of tbb based solutions http://software.intel.com/en-us/blogstbb-30-task-scheduler- improves-composability-of-tbb-based-solutions-part-1, 2010.

R. Namyst and J. Méhaut, Marcel : Une bibliothèque de processus légers

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger et al., A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, vol.7, issue.4, pp.80-113, 2007.
DOI : 10.1016/j.rti.2005.04.002

H. Pan, B. Hindman, and K. Asanovi´casanovi´c, Composing parallel software efficiently with lithe, ACM SIGPLAN Notices, vol.45, issue.6, pp.376-387, 2010.
DOI : 10.1145/1809028.1806639
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.2385

A. Pothen and C. Sun, A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993.
DOI : 10.1137/0914074

G. N. Srinivasa, B. R. Prasanna, and . Musicus, The optimal control approach to generalized multiprocessor scheduling, Algorithmica, vol.2, issue.4, pp.17-49, 1996.
DOI : 10.1007/BF01942605

J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 2007.

M. Schreiber, Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management, 2014.

R. Schreiber, A New Implementation of Sparse Gaussian Elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982.
DOI : 10.1145/356004.356006

M. Snir and S. Otto, MPI-The Complete Reference: The MPI Core, 1998.

J. E. Stone, D. Gohara, and G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, pp.66-73, 2010.
DOI : 10.1109/MCSE.2010.69

G. Teodoro, R. Sachetto, O. Sertel, M. N. Gurcan, W. Meira et al., Coordinating the use of GPU and CPU for improving performance of compute intensive applications, 2009 IEEE International Conference on Cluster Computing and Workshops, pp.1-10, 2009.
DOI : 10.1109/CLUSTR.2009.5289193

S. Thibault, R. Namyst, and P. Wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, EuroPar, 2007.
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506

S. Thibault, R. Namyst, and P. Wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Kermarrec et al. [39], pp.42-51
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506

K. Tian, Y. Jiang, X. Shen, and W. Mao, Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors, Lecture Notes Computer Science, p.7698, 2013.
DOI : 10.1007/978-3-642-35867-8_7

S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010.
DOI : 10.1109/IPDPSW.2010.5470941
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.157.7245

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.122

S. Treichler, M. Bauer, and A. Aiken, Realm, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.263-276, 2014.
DOI : 10.1145/2628071.2628084

A. Hugo, Leprobì eme de la compositionparalì ele : une approche supervisée, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013.

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing multiple starpu applications over heterogeneous machines: A supervised approach, IPDPS'13 Workshops, workshop on Accelerators and Hybrid Exascale Systems (AsHES), pp.1050-1059, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00824514

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing multiple StarPU applications over heterogeneous machines: A supervised approach, The International Journal of High Performance Computing Applications, vol.7698, issue.2, pp.285-300, 2014.
DOI : 10.1109/MCSE.2009.154
URL : https://hal.archives-ouvertes.fr/hal-00824514

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, A Runtime Approach to Dynamic Resource Allocation for Sparse Direct Solvers, 2014 43rd International Conference on Parallel Processing, 2014.
DOI : 10.1109/ICPP.2014.57
URL : https://hal.archives-ouvertes.fr/hal-01101054