The landscape of parallel computing research : A view from berkeley, pp.2006-183, 2006. ,
Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.7490, issue.9, 2014. ,
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-00974674
LAPACK: A portable linear algebra library for high-performance computers, Proceedings SUPERCOMPUTING '90, pp.2-11, 1990. ,
DOI : 10.1109/SUPERC.1990.129995
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
The Design of OpenMP Tasks, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.3, pp.404-418, 2009. ,
DOI : 10.1109/TPDS.2008.105
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries, Scientific Programming, pp.95-104, 2003. ,
StarPU : A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience, Special Issue : Euro-Par, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Task-Based Programming for Seismic Imaging: Preliminary Results, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pp.1259-1266, 2014. ,
DOI : 10.1109/HPCC.2014.205
URL : https://hal.archives-ouvertes.fr/hal-01057580
Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum PDSEC 2011, pp.1432-1441, 2011. ,
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
Extending OpenMP for NUMA machines, In Supercomputing, ACM/IEEE, pp.48-48, 2000. ,
Domain decomposition : parallel multilevel methods for elliptic partial differential equations, 2004. ,
Cilk : an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '95, pp.207-216, 1995. ,
Productive Cluster Programming with OmpSs, Proceedings of the 17th international conference on Parallel processing -Volume Part I, Euro-Par '11, pp.555-566, 2011. ,
DOI : 10.1147/rd.515.0593
Matrix Market : A web resource for test matrix collections, Proceedings of the IFIP TC2/WG2.5 Working Conference on Quality of Numerical Software : Assessment and Enhancement, pp.125-137, 1997. ,
De l'exécution d'applications scientifiques OpenMP sur architectures hiérarchiques, 2010. ,
DOI : 10.1007/978-3-540-79561-2_15
URL : https://hal.inria.fr/inria-00329934/document
Tenth SPE Comparative Solution Project: A Comparison of Upscaling Techniques, SPE Reservoir Evaluation & Engineering, vol.4, issue.04, pp.308-317, 2001. ,
DOI : 10.2118/72469-PA
PT-Scotch: A tool for efficient parallel graph ordering, Parallel Computing, vol.34, issue.6-8, pp.318-331, 2008. ,
DOI : 10.1016/j.parco.2007.12.001
URL : https://hal.archives-ouvertes.fr/hal-00402893
Fine-Grained Parallel Incomplete LU Factorization, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.169-193, 2015. ,
DOI : 10.1137/140968896
HMPP : A hybrid multi-core parallel programming environment, Workshop on General Purpose Processing on Graphics Processing Units, 2007. ,
Achieving numerical accuracy and high performance using recursive tile LU factorization, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00809765
The effect of ordering on preconditioned conjugate gradients, BIT, vol.55, issue.4, pp.635-657, 1989. ,
DOI : 10.1007/BF01932738
On parallelism and convergence of Incomplete LU factorizations, Appl. Numer. Math, vol.7, issue.5, pp.417-436, 1991. ,
Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-9, 2009. ,
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172
X-kaapi: A Multi Paradigm Runtime for Multicore Architectures, 2013 42nd International Conference on Parallel Processing, 2012. ,
DOI : 10.1109/ICPP.2013.86
URL : https://hal.archives-ouvertes.fr/hal-00727827
A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors, Journal of Parallel and Distributed Computing, vol.16, issue.4, pp.276-291, 1992. ,
DOI : 10.1016/0743-7315(92)90012-C
Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys, vol.31, issue.4, pp.406-471, 1999. ,
DOI : 10.1145/344588.344618
Multi-Constraint Mesh Partitioning for Contact/Impact Computations, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, p.56, 2003. ,
DOI : 10.1145/1048935.1050206
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.359-392, 1998. ,
DOI : 10.1137/S1064827595287997
A NUMA API for linux, Novel Inc, 2005. ,
A Comparison of Multiprocessor Scheduling Heuristics, 1994 International Conference on Parallel Processing (ICPP'94), pp.243-250, 1994. ,
DOI : 10.1109/ICPP.1994.19
Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010. ,
DOI : 10.1007/978-3-642-14390-8_60
The Cilk++ concurrency platform, The Journal of Supercomputing, vol.51, pp.522-527, 2009. ,
affinity-on-next-touch : increasing the performance of an industrial PDE solver on a cc-NUMA system, Proceedings of the 19th annual international conference on Supercomputing, ICS '05, pp.387-392, 2005. ,
Basic Linear Algebra Subprograms for Fortran Usage, ACM Transactions on Mathematical Software, vol.5, issue.3, pp.308-323, 1979. ,
DOI : 10.1145/355841.355847
Generalized nested dissection, SIAM journal on numerical analysis, vol.16, issue.2, pp.346-358, 1979. ,
A task-based H-matrix solver for acoustic and electromagnetic problems on multicore architectures, SciCADE, the International Conference on Scientific Computation and Differential Equations, 2013. ,
Data-driven execution of fast multipole methods. Concurrency and Computation : Practice and Experience, pp.1935-1946, 2014. ,
Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP, 2009 11th IEEE International Conference on High Performance Computing and Communications, pp.659-665, 2009. ,
DOI : 10.1109/HPCC.2009.75
STREAM : Sustainable memory bandwidth in high performance computers A continually updated technical report, 1991. ,
Capsules : Expressing composable computations in a parallel programming model, Languages and Compilers for Parallel Computing, pp.276-291, 2008. ,
Advanced Configuration and Power Interface : Open Standard, Operating System Sleep Mode, Hibernate (OS Feature), Synonym, 2009. ,
A comparison of some recent task-based parallel programming models, 3rd Workshop on Programmability Issues for Multi-Core Computers, 2010. ,
Sparse matrix ordering with Scotch, High-Performance Computing and Networking, pp.370-378, 1997. ,
DOI : 10.1007/BFb0031609
Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas, High Performance Computing for Computational Science ? VECPAR 2010, pp.279-292, 2011. ,
DOI : 10.1016/j.jpdc.2004.03.006
URL : https://hal.archives-ouvertes.fr/hal-00788872
Intel R Xeon Phi TM Coprocessor Architecture and Tools : The Guide for Application Developers, 2013. ,
Intel threading building blocks, 2007. ,
A NUMA-Aware Fine Grain Parallelization Framework for Multi-core Architecture, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1381-1390, 2013. ,
DOI : 10.1109/IPDPSW.2013.204
URL : https://hal.archives-ouvertes.fr/hal-00858350
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.427-436, 2009. ,
DOI : 10.1109/PDP.2009.43
Memory affinity for hierarchical shared memory multiprocessors, Computer Architecture and High Performance Computing, 2009. SBAC-PAD'09. 21st International Symposium on, pp.59-66, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00788914
Optimisation du produit matrice-vecteur creux sur architecture GPU pour un simulateur de réservoir, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013. ,
ILUT : A dual threshold Incomplete LU factorization. Numerical linear algebra with applications, pp.387-402, 1994. ,
Iterative Methods for Sparse Linear Systems, PWS, 1996. ,
DOI : 10.1137/1.9780898718003
Dynamic grain-size adaptation on object oriented parallel programming the SCOOPP approach, Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, IPPS '99/SPDP '99, pp.728-732, 1999. ,
A flexible thread scheduler for hierarchical multiprocessor machines, Second International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00000138
Task scheduling algorithms for heterogeneous processors, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99), 1999. ,
DOI : 10.1109/HCW.1999.765092
Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Euro-Par, 2007. ,
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506
A Unified Scheduler for Recursive and Task Dataflow Parallelism, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.1-11, 2011. ,
DOI : 10.1109/PACT.2011.7
Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, pp.1-27, 1998. ,
DOI : 10.1109/SC.1998.10004
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.3487
Constrained Residual Acceleration of Conjugate Residual Methods, SPE Reservoir Simulation Symposium, 1985. ,
DOI : 10.2118/13536-MS
Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
DOI : 10.1145/1498765.1498785