A. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands et al., The landscape of parallel computing research : A view from berkeley, pp.2006-183, 2006.

B. Agullo, O. Bramas, E. Coulaud, M. Darve, T. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.7490, issue.9, 2014.
DOI : 10.1002/cpe.3723

URL : https://hal.archives-ouvertes.fr/hal-00974674

Z. E. Angerson, J. Bai, A. Dongarra, A. Greenbaum, J. Mckenney et al., LAPACK: A portable linear algebra library for high-performance computers, Proceedings SUPERCOMPUTING '90, pp.2-11, 1990.
DOI : 10.1109/SUPERC.1990.129995

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013.
DOI : 10.1007/978-3-642-40047-6_53

URL : https://hal.archives-ouvertes.fr/hal-01220611

N. Ayguade, A. Copty, J. Duran, Y. Hoeflinger, F. Lin et al., The Design of OpenMP Tasks, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.3, pp.404-418, 2009.
DOI : 10.1109/TPDS.2008.105

C. Arvw03, Y. Addison, M. Ren, and . Van-waveren, OpenMP issues arising in the development of parallel BLAS and LAPACK libraries, Scientific Programming, pp.95-104, 2003.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU : A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience, Special Issue : Euro-Par, pp.187-198, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

L. Boillot, G. Bosilca, E. Agullo, and H. Calandra, Task-Based Programming for Seismic Imaging: Preliminary Results, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pp.1259-1266, 2014.
DOI : 10.1109/HPCC.2014.205

URL : https://hal.archives-ouvertes.fr/hal-01057580

. Bbd-+-11-]-g, A. Bosilca, A. Bouteiller, M. Danalis, H. Faverge et al., Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum PDSEC 2011, pp.1432-1441, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

[. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris et al., Extending OpenMP for NUMA machines, In Supercomputing, ACM/IEEE, pp.48-48, 2000.

P. Bjorstad and W. Gropp, Domain decomposition : parallel multilevel methods for elliptic partial differential equations, 2004.

D. Robert, C. F. Blumofe, B. C. Joerg, C. E. Kuszmaul, K. H. Leiserson et al., Cilk : an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '95, pp.207-216, 1995.

J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell et al., Productive Cluster Programming with OmpSs, Proceedings of the 17th international conference on Parallel processing -Volume Part I, Euro-Par '11, pp.555-566, 2011.
DOI : 10.1147/rd.515.0593

F. Ronald, R. Boisvert, K. Pozo, R. F. Remington, J. J. Barrett et al., Matrix Market : A web resource for test matrix collections, Proceedings of the IFIP TC2/WG2.5 Working Conference on Quality of Numerical Software : Assessment and Enhancement, pp.125-137, 1997.

F. Broquedis, De l'exécution d'applications scientifiques OpenMP sur architectures hiérarchiques, 2010.
DOI : 10.1007/978-3-540-79561-2_15

URL : https://hal.inria.fr/inria-00329934/document

M. A. Christie and M. J. Blunt, Tenth SPE Comparative Solution Project: A Comparison of Upscaling Techniques, SPE Reservoir Evaluation & Engineering, vol.4, issue.04, pp.308-317, 2001.
DOI : 10.2118/72469-PA

C. Chevalier and F. Pellegrini, PT-Scotch: A tool for efficient parallel graph ordering, Parallel Computing, vol.34, issue.6-8, pp.318-331, 2008.
DOI : 10.1016/j.parco.2007.12.001

URL : https://hal.archives-ouvertes.fr/hal-00402893

E. Chow and A. Patel, Fine-Grained Parallel Incomplete LU Factorization, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.169-193, 2015.
DOI : 10.1137/140968896

R. Dolbeau, S. Bihan, and F. Bodin, HMPP : A hybrid multi-core parallel programming environment, Workshop on General Purpose Processing on Graphics Processing Units, 2007.

J. J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00809765

I. S. Duff and G. A. Meurant, The effect of ordering on preconditioned conjugate gradients, BIT, vol.55, issue.4, pp.635-657, 1989.
DOI : 10.1007/BF01932738

]. S. Doi91 and . Doi, On parallelism and convergence of Incomplete LU factorizations, Appl. Numer. Math, vol.7, issue.5, pp.417-436, 1991.

B. Goglin and N. Furmento, Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-9, 2009.
DOI : 10.1109/IPDPS.2009.5161101

URL : https://hal.archives-ouvertes.fr/inria-00358172

T. Gautier, F. Lementec, V. Faucher, and B. Raffin, X-kaapi: A Multi Paradigm Runtime for Multicore Architectures, 2013 42nd International Conference on Parallel Processing, 2012.
DOI : 10.1109/ICPP.2013.86

URL : https://hal.archives-ouvertes.fr/hal-00727827

A. Gerasoulis and T. Yang, A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors, Journal of Parallel and Distributed Computing, vol.16, issue.4, pp.276-291, 1992.
DOI : 10.1016/0743-7315(92)90012-C

Y. Kwok and I. Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys, vol.31, issue.4, pp.406-471, 1999.
DOI : 10.1145/344588.344618

G. Karypis, Multi-Constraint Mesh Partitioning for Contact/Impact Computations, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, p.56, 2003.
DOI : 10.1145/1048935.1050206

G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.359-392, 1998.
DOI : 10.1137/S1064827595287997

[. Kleen, A NUMA API for linux, Novel Inc, 2005.

A. A. Khan, C. L. Mccreary, and M. S. Jones, A Comparison of Multiprocessor Scheduling Heuristics, 1994 International Conference on Parallel Processing (ICPP'94), pp.243-250, 1994.
DOI : 10.1109/ICPP.1994.19

S. Lankes, B. Bierbaum, and T. Bemmerl, Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010.
DOI : 10.1007/978-3-642-14390-8_60

C. E. Leiserson, The Cilk++ concurrency platform, The Journal of Supercomputing, vol.51, pp.522-527, 2009.

H. Löf and S. Holmgren, affinity-on-next-touch : increasing the performance of an industrial PDE solver on a cc-NUMA system, Proceedings of the 19th annual international conference on Supercomputing, ICS '05, pp.387-392, 2005.

R. [. Lawson, D. R. Hanson, F. T. Kincaid, and . Krogh, Basic Linear Algebra Subprograms for Fortran Usage, ACM Transactions on Mathematical Software, vol.5, issue.3, pp.308-323, 1979.
DOI : 10.1145/355841.355847

J. Richard, . Lipton, J. Donald, R. E. Rose, and . Tarjan, Generalized nested dissection, SIAM journal on numerical analysis, vol.16, issue.2, pp.346-358, 1979.

G. [. Lize, E. Sylvand, S. Agullo, and . Thibault, A task-based H-matrix solver for acoustic and electromagnetic problems on multicore architectures, SciCADE, the International Conference on Scientific Computation and Differential Equations, 2013.

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods. Concurrency and Computation : Practice and Experience, pp.1935-1946, 2014.

S. Liu, Y. Zhang, X. Sun, and R. Qiu, Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP, 2009 11th IEEE International Conference on High Performance Computing and Communications, pp.659-665, 2009.
DOI : 10.1109/HPCC.2009.75

J. D. Mccalpin, STREAM : Sustainable memory bandwidth in high performance computers A continually updated technical report, 1991.

. Hasnaina, U. Mandviwala, K. Ramachandran, and . Knobe, Capsules : Expressing composable computations in a parallel programming model, Languages and Compilers for Parallel Computing, pp.276-291, 2008.

F. P. Miller, A. F. Vandome, and J. Mcbrewster, Advanced Configuration and Power Interface : Open Standard, Operating System Sleep Mode, Hibernate (OS Feature), Synonym, 2009.

A. Podobas, M. Brorsson, and K. Faxén, A comparison of some recent task-based parallel programming models, 3rd Workshop on Programmability Issues for Multi-Core Computers, 2010.

F. Pellegrini and J. Roman, Sparse matrix ordering with Scotch, High-Performance Computing and Networking, pp.370-378, 1997.
DOI : 10.1007/BFb0031609

C. Pousa-ribeiro, M. Castro, J. Méhaut, and A. Carissimi, Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas, High Performance Computing for Computational Science ? VECPAR 2010, pp.279-292, 2011.
DOI : 10.1016/j.jpdc.2004.03.006

URL : https://hal.archives-ouvertes.fr/hal-00788872

[. Rahman, Intel R Xeon Phi TM Coprocessor Architecture and Tools : The Guide for Application Developers, 2013.

J. Reinders, Intel threading building blocks, 2007.

C. Rossignon, P. Henon, O. Aumage, and S. Thibault, A NUMA-Aware Fine Grain Parallelization Framework for Multi-core Architecture, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1381-1390, 2013.
DOI : 10.1109/IPDPSW.2013.204

URL : https://hal.archives-ouvertes.fr/hal-00858350

R. Rabenseifner, G. Hager, and G. Jost, Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.427-436, 2009.
DOI : 10.1109/PDP.2009.43

C. Pousa-ribeiro, A. Mehaut, M. Carissimi, L. G. Castro, and . Fernandes, Memory affinity for hierarchical shared memory multiprocessors, Computer Architecture and High Performance Computing, 2009. SBAC-PAD'09. 21st International Symposium on, pp.59-66, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00788914

C. Rossignon, Optimisation du produit matrice-vecteur creux sur architecture GPU pour un simulateur de réservoir, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013.

Y. Saad, ILUT : A dual threshold Incomplete LU factorization. Numerical linear algebra with applications, pp.387-402, 1994.

Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, 1996.
DOI : 10.1137/1.9780898718003

J. Luís, S. , and A. Proença, Dynamic grain-size adaptation on object oriented parallel programming the SCOOPP approach, Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, IPPS '99/SPDP '99, pp.728-732, 1999.

S. Thibault, A flexible thread scheduler for hierarchical multiprocessor machines, Second International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000138

H. Topcuoglu, S. Hariri, and M. Wu, Task scheduling algorithms for heterogeneous processors, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99), 1999.
DOI : 10.1109/HCW.1999.765092

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.

S. Thibault, R. Namyst, and P. Wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Euro-Par, 2007.
DOI : 10.1007/978-3-540-74466-5_6

URL : https://hal.archives-ouvertes.fr/inria-00154506

G. [. Vandierendonck, D. S. Tzenakis, and . Nikolopoulos, A Unified Scheduler for Recursive and Task Dataflow Parallelism, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.1-11, 2011.
DOI : 10.1109/PACT.2011.7

[. Whaley and J. J. Dongarra, Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, pp.1-27, 1998.
DOI : 10.1109/SC.1998.10004

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.3487

. Wallis, . Kendall, and . Te-little, Constrained Residual Acceleration of Conjugate Residual Methods, SPE Reservoir Simulation Symposium, 1985.
DOI : 10.2118/13536-MS

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.
DOI : 10.1145/1498765.1498785