E. Agullo, . Augonnet, . Cédric, . Dongarra, . Jack et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.90

URL : https://hal.archives-ouvertes.fr/inria-00547614

E. Agullo, . Augonnet, . Cédric, . Dongarra, . Jack et al., Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547847

E. Agullo, . Aumage, . Olivier, . Bramas, . Berenger et al., Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01372022

E. Agullo, . Beaumont, . Olivier, L. Eyraud-dubois, and S. Kumar, Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.1021-1030, 2016.
DOI : 10.1109/IPDPS.2016.90

URL : https://hal.archives-ouvertes.fr/hal-01223573

E. Agullo, . Bramas, . Berenger, . Coulaud, . Olivier et al., Task-based FMM for heterogeneous architectures. Concurrency and Computation: Practice and Experience, 2016.
DOI : 10.1002/cpe.3723

URL : https://hal.archives-ouvertes.fr/hal-00974674

E. Agullo, . Demmel, . Jim, . Dongarra, . Jack et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, pp.1742-6596, 2009.
DOI : 10.1088/1742-6596/180/1/012037

P. Alpatov, . Baker, . Greg, . Edwards, . Carter et al., PLAPACK, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '97, pp.1-16, 1997.
DOI : 10.1145/509593.509622

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., LAPACK: A Portable Linear Algebra Library for High-performance Computers, Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, Supercomputing '90, pp.2-11, 1990.

C. Augonnet, Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System's Perspective. Theses, Université Bordeaux 1. URL https, 2011.

C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures. Dans International Euro-Par Workshops HPPC'09, tome 6043 de Lecture Notes in Computer Science, pp.56-65978, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00421333

C. Augonnet, . Thibault, . Samuel, R. Namyst, . Wacrenier et al., StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
DOI : 10.1007/978-3-642-03869-3_80

URL : https://hal.archives-ouvertes.fr/inria-00384363

A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, . Aamodt et al., Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-1744919648, 2009.
DOI : 10.1109/ISPASS.2009.4919648

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.507.8371

P. Baptiste, L. Pape, C. Et-nuijten, and W. , Constraint-based scheduling: applying constraint programming to scheduling problems, tome 39, 2012.
DOI : 10.1007/978-1-4615-1479-4

M. Bauer, . Treichler, . Sean, E. Slaughter, and A. Et-aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-66, 2012.
DOI : 10.1109/SC.2012.71

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715

O. Beaumont, . Cojean, . Terry, . Eyraud-dubois, . Lionel et al., Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016.
DOI : 10.1109/HiPC.2016.045

URL : https://hal.archives-ouvertes.fr/hal-01361992

O. Beaumont, L. Eyraud-dubois, and S. Kumar, Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs. Working paper or preprint. URL https, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01386174

L. S. Blackford, J. Choi, A. Cleary, E. D-'azeuedo, J. Demmel et al., ScaLAPACK User's Guide, Society for Industrial and Applied Mathematics, 1997.
DOI : 10.1137/1.9780898719642

R. Bleuse, . Gautier, . Thierry, J. V. Lima, . Mounié et al., Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures, pp.560-571, 2014.
DOI : 10.1007/978-3-319-09873-9_47

URL : https://hal.archives-ouvertes.fr/hal-01081629

R. Bleuse, . Hunold, . Sascha, . Kedad-sidhoum, . Safia et al., Scheduling Independent Moldable Tasks on Multi-Cores with GPUs, IEEE Transactions on Parallel and Distributed Systems, 2016.
DOI : 10.1109/TPDS.2017.2675891

URL : https://hal.archives-ouvertes.fr/hal-01516752

R. Bleuse, . Kedad-sidhoum, . Safia, . Monna, . Florence et al., Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience, vol.20, issue.4, pp.1625-1638, 2015.
DOI : 10.1002/cpe.3359

URL : https://hal.archives-ouvertes.fr/hal-01081625

R. D. Blumofe, C. F. Joerg, . Kuszmaul, C. Bradley, C. E. Leiserson et al., Cilk, ACM SIGPLAN Notices, vol.30, issue.8, pp.207-216, 1995.
DOI : 10.1145/209937.209958

R. D. Blumofe, . Leiserson, and E. Charles, Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999.
DOI : 10.1145/324133.324234

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.7695

V. Bonifaci and A. Et-wiese, Scheduling unrelated machines of few different types, 2012.

G. Bosilca, . Bouteiller, . Aurelien, . Danalis, . Anthony et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.
DOI : 10.1109/IPDPS.2011.299

G. Bosilca, . Bouteiller, . Aurélien, . Danalis, . Anthony et al., PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, Computing in Science and Engineering, 2013.
DOI : 10.1109/mcse.2013.98

URL : https://hal.archives-ouvertes.fr/hal-00930217

H. Bouwmeester and J. Langou, A critical path approach to analyzing parallelism of algorithmic variants. application to cholesky inversion, 1010.

. Bouwmeester and M. Henricus, Tiled algorithms for matrix computations on multicore architectures, Thèse de doctorat, 2012.

F. Broquedis, . Clet-ortega, . Jérôme, . Moreaud, . Stéphanie et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

P. Brucker and S. Knust, Complexity results for scheduling problems. Web document, 2009.

A. Buttari, . Langou, . Julien, J. Kurzak, and J. Dongarra, Lapack working note 191: A class of parallel tiled linear algebra algorithms for multicore architectures, 2007.

A. Buttari, . Langou, . Julien, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008.
DOI : 10.1007/978-3-540-68111-3_67

URL : http://arxiv.org/abs/0707.3548

A. Buttari, . Langou, . Julien, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

H. Casanova, A. Legrand, and M. Et-quinson, Sim- Grid: a Generic Framework for Large-Scale Distributed Experiments, 10th IEEE International Conference on Computer Modeling and Simulation, 2008.
DOI : 10.1109/uksim.2008.28

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.182.5943

E. Chan, V. Zee, G. Field, . Bientinesi, . Paolo et al., SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008.
DOI : 10.1145/1345206.1345227

J. Chandrasekar, I. S. Kim, D. S. Bernstein, and A. J. Et-ridley, Cholesky-based reduced-rank square-root Kalman filtering, 2008 American Control Conference, pp.3987-3992, 2008.
DOI : 10.1109/ACC.2008.4587116

. Chetto, . Houssine, . Silly, . Maryline, and T. Bouchentouf, Dynamic scheduling of real-time tasks under precedence constraints. Real- Time Systems, pp.181-194, 1990.
DOI : 10.1007/bf00365326

T. Cojean, . Guermouche, . Abdou, . Hugo, . Andra et al., Exploiting two-level parallelism by aggregating computing resources in task-based applications over accelerator-based machines. Inria technical report, Inria. URL https, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01502749

S. Collange, M. Daumas, D. Defour, and D. Et-parello, Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010.
DOI : 10.1109/MASCOTS.2010.43

M. Cosnard, E. Jeannot, and T. Et-yang, SLC: Symbolic scheduling for executing parameterized task graphs on multiprocessors, Proceedings of the 1999 International Conference on Parallel Processing, pp.413-421, 1999.
DOI : 10.1109/ICPP.1999.797429

URL : https://hal.archives-ouvertes.fr/inria-00098842

C. Dhillon, . Demmel, J. Choi, J. Demmel, I. Dhillon et al., Lapack working note 95 scalapack: A portable linear algebra library for distributed memory computers -design issues and performance, 1995.

D. Johnston, HPC Matters to our Quality of Life and Pros- perity, 2014.

J. Dongarra and U. , The LINPACK Benchmark: An explanation, Proceedings of the 1st International Conference on Supercomputing, pp.456-474, 1988.
DOI : 10.1007/3-540-18991-2_27

A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell et al., Ompss: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, pp.173-193, 2011.

P. Dutot, . Mounié, . Grégory, and D. Trystram, Scheduling Parallel Tasks: Approximation Algorithms Handbook of Scheduling: Algorithms, Models, and Performance Analysis, URL, vol.26, pp.26-27, 2004.

B. S. Garbow, EISPACK ? A package of matrix eigensystem routines, Computer Physics Communications, vol.7, issue.4, pp.179-184, 1974.
DOI : 10.1016/0010-4655(74)90086-1

M. R. Garey and D. S. Et-johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979.

J. Gehrke, . Clemens, . Jansen, . Klaus, S. E. Kraft et al., A PTAS for Scheduling Unrelated Machines of Few Different Types, pp.290-301978, 2016.
DOI : 10.1007/978-3-662-49192-8_24

A. Gillman, Fast direct solvers for elliptic partial differential equations. Theses, University of Colorado. URL https://amath.colorado, 2011.

R. L. Graham, Bounds for Certain Multiprocessing Anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966.
DOI : 10.1002/j.1538-7305.1966.tb01709.x

R. L. Graham, Bounds on Multiprocessing Timing Anomalies, SIAM Journal on Applied Mathematics, vol.17, issue.2, pp.416-429, 1969.
DOI : 10.1137/0117039

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.8131

F. G. Gustavson, High-performance linear algebra algorithms using new generalized data structures for matrices, IBM Journal of Research and Development, vol.47, issue.1, pp.31-55, 2003.
DOI : 10.1147/rd.471.0031

M. Haugh, The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables. Course Notes. URL http://www.columbia, 2004.

E. Hermann, . Raffin, . Bruno, . Faure, . François et al., Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, pp.235-246, 2010.
DOI : 10.1007/978-3-642-15291-7_23

URL : https://hal.archives-ouvertes.fr/inria-00502448

N. J. Higham, Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, 2002.
DOI : 10.1137/1.9780898718027

F. D. Igual, E. Chan, E. S. Quintana-ortí, . Quintana-ortí, . Gregorio et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

C. Imreh, Scheduling Problems on Two Sets of Identical Machines, Computing, vol.70, issue.4, pp.277-294, 2003.
DOI : 10.1007/s00607-003-0011-9

. Jaulmes, . Luc, . Ayguadé, . Eduard, . Casas et al., Exploiting asynchrony from exact forward recovery for DUE in iterative solvers, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.1-53, 2015.
DOI : 10.1145/2807591.2807599

C. H. Koelbel, The High performance Fortran handbook. Scientific and engineering computation, 1994.
DOI : 10.1063/1.4823319

J. Lenstra, . Karel, D. B. Shmoys, and É. Tardos, Approximation algorithms for scheduling unrelated parallel machines, 1990.
DOI : 10.1109/sfcs.1987.8

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.708

. Ltaief, . Hatem, . Gratadour, . Damien, A. Charara et al., Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Proceedings of the Platform for Advanced Scientific Computing Conference on ZZZ, PASC '16, pp.1-9, 2016.
DOI : 10.1145/2929908.2929920

G. Manimaran, C. Et-murthy, and . Siva-ram, An efficient dynamic scheduling algorithm for multiprocessor real-time systems. Parallel and Distributed Systems, IEEE Transactions on, vol.9, issue.3, pp.312-319, 1998.
DOI : 10.1109/71.674322

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3939

M. Chabowski, How HPC Impacts Our Lives II: HPC (and Linux) in the Movies. URL https, 2016.

M. Feldman, First US Exascale Supercomputer Now On Track for 2021. URL https, 2016.

M. Feldman, China will deploy exascale prototype this year. URL https, 2017.

E. G. Ng and P. Et-raghavan, Performance of Greedy Ordering Heuristics for Sparse Cholesky Factorization, SIAM Journal on Matrix Analysis and Applications, vol.20, issue.4, pp.902-914, 1999.
DOI : 10.1137/S0895479897319313

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

W. Quach and J. Langou, A makespan lower bound for the scheduling of the tiled cholesky factorization based on ALAP scheduling, 2015.

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, PPOPP'09, pp.121-130, 2009.

. Quintana-ortí, . Gregorio, E. S. Quintana-ortí, R. A. Geijn, . Zee et al., Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, p.3614, 2009.
DOI : 10.1145/1527286.1527288

P. Raghavan, Distributed Sparse Matrix Factorization: QR and Cholesky Decompositions, Thèse de doctorat, pp.92-14255, 1992.

. Raghavan, . Padma, . Teranishi, . Keita, . Ng et al., A latency tolerant hybrid sparse solver using incomplete cholesky factorization . Numerical Linear Algebra with Applications, pp.541-560, 2003.
DOI : 10.1002/nla.327

A. Rico, . Cabarcas, . Felipe, . Villavieja, . Carlos et al., On the simulation of large-scale architectures using multiple application abstraction levels, ACM Transactions on Architecture and Code Optimization, vol.8, issue.4, 2012.
DOI : 10.1145/2086696.2086715

R. Viola, Why do supercomputers matter for your everyday life? URL https://ec.europa.eu/digital-single-market/en/blog/ why-do-supercomputers-matter-your-everyday-life, 2015.

A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield et al., The structural simulation toolkit, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, pp.37-42, 2011.
DOI : 10.1145/1964218.1964225

E. Rothberg and A. Gupta, An Efficient Block-Oriented Approach to Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.15, issue.6, pp.1413-1439, 1994.
DOI : 10.1137/0915085

V. Rotkin and . Toledo, The design and implementation of a new out-of-core sparse cholesky factorization method, ACM Transactions on Mathematical Software, vol.30, issue.1, pp.19-46, 2004.
DOI : 10.1145/974781.974783

V. Sarkar, Partitioning and scheduling parallel programs for multiprocessing. Research monographs in parallel and distributed computing, 1989.

A. Z. Shahul, . Semar, and O. Sinnen, Scheduling task graphs optimally with a*, The Journal of Supercomputing, vol.51, issue.3, pp.310-332, 2010.

E. V. Shchepin, . Vakhania, and . Nodari, An optimal rounding gives a better approximation for scheduling unrelated machines. Operations Research Letters, 2005.
DOI : 10.1016/j.orl.2004.05.004

. Stanisic, . Luka, . Thibault, . Samuel, . Legrand et al., Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, 20th International Conference on Parallel Processing, 2014.
DOI : 10.1007/978-3-319-09873-9_5

URL : https://hal.archives-ouvertes.fr/hal-01011633

S. Blackford, The Two-dimensional Block-Cyclic Distri- bution, 1997.

. Tomov, . Stanimire, J. Dongarra, and M. Et-baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, 2010.
DOI : 10.1016/j.parco.2009.12.005

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.5082

. Topcuouglu, . Haluk, S. Hariri, and . Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

R. Ubal, . Jang, . Byunghyun, . Mistry, . Perhaad et al., Multi2Sim, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.335-344, 2012.
DOI : 10.1145/2370816.2370865

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Et-dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01078359

F. Xhafa, L. Barolli, and A. Et-durresi, Immediate Mode Scheduling of Independent Jobs in Computational Grids, 21st International Conference on Advanced Networking and Applications (AINA '07), pp.970-977, 2007.
DOI : 10.1109/AINA.2007.78