QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547847
Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01372022
Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.1021-1030, 2016. ,
DOI : 10.1109/IPDPS.2016.90
URL : https://hal.archives-ouvertes.fr/hal-01223573
Task-based FMM for heterogeneous architectures. Concurrency and Computation: Practice and Experience, 2016. ,
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-00974674
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, pp.1742-6596, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
PLAPACK, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '97, pp.1-16, 1997. ,
DOI : 10.1145/509593.509622
LAPACK: A Portable Linear Algebra Library for High-performance Computers, Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, Supercomputing '90, pp.2-11, 1990. ,
Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System's Perspective. Theses, Université Bordeaux 1. URL https, 2011. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures. Dans International Euro-Par Workshops HPPC'09, tome 6043 de Lecture Notes in Computer Science, pp.56-65978, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00421333
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363
Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-1744919648, 2009. ,
DOI : 10.1109/ISPASS.2009.4919648
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.507.8371
Constraint-based scheduling: applying constraint programming to scheduling problems, tome 39, 2012. ,
DOI : 10.1007/978-1-4615-1479-4
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-66, 2012. ,
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715
Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016. ,
DOI : 10.1109/HiPC.2016.045
URL : https://hal.archives-ouvertes.fr/hal-01361992
Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs. Working paper or preprint. URL https, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01386174
ScaLAPACK User's Guide, Society for Industrial and Applied Mathematics, 1997. ,
DOI : 10.1137/1.9780898719642
Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures, pp.560-571, 2014. ,
DOI : 10.1007/978-3-319-09873-9_47
URL : https://hal.archives-ouvertes.fr/hal-01081629
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs, IEEE Transactions on Parallel and Distributed Systems, 2016. ,
DOI : 10.1109/TPDS.2017.2675891
URL : https://hal.archives-ouvertes.fr/hal-01516752
Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience, vol.20, issue.4, pp.1625-1638, 2015. ,
DOI : 10.1002/cpe.3359
URL : https://hal.archives-ouvertes.fr/hal-01081625
Cilk, ACM SIGPLAN Notices, vol.30, issue.8, pp.207-216, 1995. ,
DOI : 10.1145/209937.209958
Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999. ,
DOI : 10.1145/324133.324234
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.7695
Scheduling unrelated machines of few different types, 2012. ,
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011. ,
DOI : 10.1109/IPDPS.2011.299
PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, Computing in Science and Engineering, 2013. ,
DOI : 10.1109/mcse.2013.98
URL : https://hal.archives-ouvertes.fr/hal-00930217
A critical path approach to analyzing parallelism of algorithmic variants. application to cholesky inversion, 1010. ,
Tiled algorithms for matrix computations on multicore architectures, Thèse de doctorat, 2012. ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Complexity results for scheduling problems. Web document, 2009. ,
Lapack working note 191: A class of parallel tiled linear algebra algorithms for multicore architectures, 2007. ,
Parallel tiled QR factorization for multicore architectures Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008. ,
DOI : 10.1007/978-3-540-68111-3_67
URL : http://arxiv.org/abs/0707.3548
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Sim- Grid: a Generic Framework for Large-Scale Distributed Experiments, 10th IEEE International Conference on Computer Modeling and Simulation, 2008. ,
DOI : 10.1109/uksim.2008.28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.182.5943
SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008. ,
DOI : 10.1145/1345206.1345227
Cholesky-based reduced-rank square-root Kalman filtering, 2008 American Control Conference, pp.3987-3992, 2008. ,
DOI : 10.1109/ACC.2008.4587116
Dynamic scheduling of real-time tasks under precedence constraints. Real- Time Systems, pp.181-194, 1990. ,
DOI : 10.1007/bf00365326
Exploiting two-level parallelism by aggregating computing resources in task-based applications over accelerator-based machines. Inria technical report, Inria. URL https, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01502749
Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
DOI : 10.1109/MASCOTS.2010.43
SLC: Symbolic scheduling for executing parameterized task graphs on multiprocessors, Proceedings of the 1999 International Conference on Parallel Processing, pp.413-421, 1999. ,
DOI : 10.1109/ICPP.1999.797429
URL : https://hal.archives-ouvertes.fr/inria-00098842
Lapack working note 95 scalapack: A portable linear algebra library for distributed memory computers -design issues and performance, 1995. ,
HPC Matters to our Quality of Life and Pros- perity, 2014. ,
The LINPACK Benchmark: An explanation, Proceedings of the 1st International Conference on Supercomputing, pp.456-474, 1988. ,
DOI : 10.1007/3-540-18991-2_27
Ompss: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, pp.173-193, 2011. ,
Scheduling Parallel Tasks: Approximation Algorithms Handbook of Scheduling: Algorithms, Models, and Performance Analysis, URL, vol.26, pp.26-27, 2004. ,
EISPACK ? A package of matrix eigensystem routines, Computer Physics Communications, vol.7, issue.4, pp.179-184, 1974. ,
DOI : 10.1016/0010-4655(74)90086-1
Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979. ,
A PTAS for Scheduling Unrelated Machines of Few Different Types, pp.290-301978, 2016. ,
DOI : 10.1007/978-3-662-49192-8_24
Fast direct solvers for elliptic partial differential equations. Theses, University of Colorado. URL https://amath.colorado, 2011. ,
Bounds for Certain Multiprocessing Anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966. ,
DOI : 10.1002/j.1538-7305.1966.tb01709.x
Bounds on Multiprocessing Timing Anomalies, SIAM Journal on Applied Mathematics, vol.17, issue.2, pp.416-429, 1969. ,
DOI : 10.1137/0117039
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.8131
High-performance linear algebra algorithms using new generalized data structures for matrices, IBM Journal of Research and Development, vol.47, issue.1, pp.31-55, 2003. ,
DOI : 10.1147/rd.471.0031
The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables. Course Notes. URL http://www.columbia, 2004. ,
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, pp.235-246, 2010. ,
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448
Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, 2002. ,
DOI : 10.1137/1.9780898718027
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012. ,
DOI : 10.1016/j.jpdc.2011.10.014
Scheduling Problems on Two Sets of Identical Machines, Computing, vol.70, issue.4, pp.277-294, 2003. ,
DOI : 10.1007/s00607-003-0011-9
Exploiting asynchrony from exact forward recovery for DUE in iterative solvers, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.1-53, 2015. ,
DOI : 10.1145/2807591.2807599
The High performance Fortran handbook. Scientific and engineering computation, 1994. ,
DOI : 10.1063/1.4823319
Approximation algorithms for scheduling unrelated parallel machines, 1990. ,
DOI : 10.1109/sfcs.1987.8
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.708
Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Proceedings of the Platform for Advanced Scientific Computing Conference on ZZZ, PASC '16, pp.1-9, 2016. ,
DOI : 10.1145/2929908.2929920
An efficient dynamic scheduling algorithm for multiprocessor real-time systems. Parallel and Distributed Systems, IEEE Transactions on, vol.9, issue.3, pp.312-319, 1998. ,
DOI : 10.1109/71.674322
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3939
How HPC Impacts Our Lives II: HPC (and Linux) in the Movies. URL https, 2016. ,
First US Exascale Supercomputer Now On Track for 2021. URL https, 2016. ,
China will deploy exascale prototype this year. URL https, 2017. ,
Performance of Greedy Ordering Heuristics for Sparse Cholesky Factorization, SIAM Journal on Matrix Analysis and Applications, vol.20, issue.4, pp.902-914, 1999. ,
DOI : 10.1137/S0895479897319313
Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009. ,
DOI : 10.1177/1094342009106195
A makespan lower bound for the scheduling of the tiled cholesky factorization based on ALAP scheduling, 2015. ,
Solving dense linear systems on platforms with multiple hardware accelerators, PPOPP'09, pp.121-130, 2009. ,
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, p.3614, 2009. ,
DOI : 10.1145/1527286.1527288
Distributed Sparse Matrix Factorization: QR and Cholesky Decompositions, Thèse de doctorat, pp.92-14255, 1992. ,
A latency tolerant hybrid sparse solver using incomplete cholesky factorization . Numerical Linear Algebra with Applications, pp.541-560, 2003. ,
DOI : 10.1002/nla.327
On the simulation of large-scale architectures using multiple application abstraction levels, ACM Transactions on Architecture and Code Optimization, vol.8, issue.4, 2012. ,
DOI : 10.1145/2086696.2086715
Why do supercomputers matter for your everyday life? URL https://ec.europa.eu/digital-single-market/en/blog/ why-do-supercomputers-matter-your-everyday-life, 2015. ,
The structural simulation toolkit, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, pp.37-42, 2011. ,
DOI : 10.1145/1964218.1964225
An Efficient Block-Oriented Approach to Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.15, issue.6, pp.1413-1439, 1994. ,
DOI : 10.1137/0915085
The design and implementation of a new out-of-core sparse cholesky factorization method, ACM Transactions on Mathematical Software, vol.30, issue.1, pp.19-46, 2004. ,
DOI : 10.1145/974781.974783
Partitioning and scheduling parallel programs for multiprocessing. Research monographs in parallel and distributed computing, 1989. ,
Scheduling task graphs optimally with a*, The Journal of Supercomputing, vol.51, issue.3, pp.310-332, 2010. ,
An optimal rounding gives a better approximation for scheduling unrelated machines. Operations Research Letters, 2005. ,
DOI : 10.1016/j.orl.2004.05.004
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, 20th International Conference on Parallel Processing, 2014. ,
DOI : 10.1007/978-3-319-09873-9_5
URL : https://hal.archives-ouvertes.fr/hal-01011633
The Two-dimensional Block-Cyclic Distri- bution, 1997. ,
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.5082
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Multi2Sim, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.335-344, 2012. ,
DOI : 10.1145/2370816.2370865
Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015. ,
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359
Immediate Mode Scheduling of Independent Jobs in Computational Grids, 21st International Conference on Advanced Networking and Applications (AINA '07), pp.970-977, 2007. ,
DOI : 10.1109/AINA.2007.78