A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2010. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing -19th International Conference, pp.521-532, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, p.2014 ,
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368
Multifrontal QR Factorization in a Multiprocessor Environment, Numerical Linear Algebra with Applications, vol.8, issue.89, pp.275-300, 1996. ,
DOI : 10.1002/(SICI)1099-1506(199607/08)3:4<275::AID-NLA83>3.0.CO;2-7
The openmp api specification for parallel programming, 2012. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, 2011. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Proceedings of the 15th Euro-Par Conference, pp.863-874, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.66 ,
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, 2013. ,
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471
Introduction to UPC and Language Specification, 1999. ,
Enabling low-overhead hybrid mpi/openmp parallelism with MPC. In Beyond Loop Level Parallelism in OpenMP: Accelerators , Tasking and More, Proceedings, pp.1-14, 2010. ,
Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
Harmony: an execution model and runtime for heterogeneous many core systems, HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing, pp.197-200, 2008. ,
Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006. ,
DOI : 10.1109/SC.2006.55
The Design and Implementation of FFTW3, Proceedings of the IEEE, pp.216-231, 2005. ,
DOI : 10.1109/JPROC.2004.840301
The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998. ,
DOI : 10.1145/277652.277725
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004. ,
DOI : 10.1007/978-3-540-30218-6_19
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1299-1308, 2013. ,
DOI : 10.1109/IPDPS.2013.66
URL : https://hal.archives-ouvertes.fr/hal-00799904
Daubechies wavelets for high performance electronic structure calculations: The BigDFT project, Comptes Rendus M??canique, vol.339, issue.2-3, pp.339-341, 2011. ,
DOI : 10.1016/j.crme.2010.12.003
MPICH2: A new start for MPI implementations In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, 2002. ,
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010. ,
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448
Storage and Analysis, SC '12, SC Conference on High Performance Computing Networking, 2012. ,
Oversubscription on multicore processors, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-11, 2010. ,
DOI : 10.1109/IPDPS.2010.5470434
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.168.4615
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012. ,
DOI : 10.1016/j.jpdc.2011.10.014
Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11 ,
DOI : 10.1109/SC.2010.49
URL : http://charm.cs.illinois.edu/newPapers/10-16/paper.pdf
Fully dynamic scheduler for numerical computing on multicore processors. LAPACK working note, lawn220, 2009. ,
An approximation algorithm for scheduling trees of malleable tasks, European Journal of Operational Research, vol.142, issue.2, pp.242-249, 2002. ,
DOI : 10.1016/S0377-2217(02)00264-3
URL : https://hal.archives-ouvertes.fr/hal-00001546
Tbb 3.0 task scheduler improves composability of tbb based solutions http://software.intel.com/en-us/blogstbb-30-task-scheduler- improves-composability-of-tbb-based-solutions-part-1, 2010. ,
Marcel : Une bibliothèque de processus légers ,
A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, vol.7, issue.4, pp.80-113, 2007. ,
DOI : 10.1016/j.rti.2005.04.002
Composing parallel software efficiently with lithe, ACM SIGPLAN Notices, vol.45, issue.6, pp.376-387, 2010. ,
DOI : 10.1145/1809028.1806639
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.2385
A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993. ,
DOI : 10.1137/0914074
The optimal control approach to generalized multiprocessor scheduling, Algorithmica, vol.2, issue.4, pp.17-49, 1996. ,
DOI : 10.1007/BF01942605
Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 2007. ,
Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management, 2014. ,
A New Implementation of Sparse Gaussian Elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982. ,
DOI : 10.1145/356004.356006
MPI-The Complete Reference: The MPI Core, 1998. ,
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, pp.66-73, 2010. ,
DOI : 10.1109/MCSE.2010.69
Coordinating the use of GPU and CPU for improving performance of compute intensive applications, 2009 IEEE International Conference on Cluster Computing and Workshops, pp.1-10, 2009. ,
DOI : 10.1109/CLUSTR.2009.5289193
Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, EuroPar, 2007. ,
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506
Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Kermarrec et al. [39], pp.42-51 ,
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506
Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors, Lecture Notes Computer Science, p.7698, 2013. ,
DOI : 10.1007/978-3-642-35867-8_7
Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010. ,
DOI : 10.1109/IPDPSW.2010.5470941
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.157.7245
Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.122
Realm, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.263-276, 2014. ,
DOI : 10.1145/2628071.2628084
Leprobì eme de la compositionparalì ele : une approche supervisée, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013. ,
Composing multiple starpu applications over heterogeneous machines: A supervised approach, IPDPS'13 Workshops, workshop on Accelerators and Hybrid Exascale Systems (AsHES), pp.1050-1059, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00824514
Composing multiple StarPU applications over heterogeneous machines: A supervised approach, The International Journal of High Performance Computing Applications, vol.7698, issue.2, pp.285-300, 2014. ,
DOI : 10.1109/MCSE.2009.154
URL : https://hal.archives-ouvertes.fr/hal-00824514
A Runtime Approach to Dynamic Resource Allocation for Sparse Direct Solvers, 2014 43rd International Conference on Parallel Processing, 2014. ,
DOI : 10.1109/ICPP.2014.57
URL : https://hal.archives-ouvertes.fr/hal-01101054