L. Hodgkin, A History of Mathematics : From Mesopotamia to Modernity, 2005.

A. Tucker, The Growing Importance of Linear Algebra in Undergraduate Mathematics, The College Mathematics Journal, vol.24, issue.1, pp.3-9, 1993.
DOI : 10.2307/2686426

G. E. Moore, Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, 1965.
DOI : 10.1109/JPROC.1998.658762

J. K. Prentice, A Quantum Mechanical Theory for the Scattering of Low- Energy Atoms from Incommensurate Crystal Surface Layers, 1992.

A. Edelman, The first annual large dense linear system survey, ACM SIGNUM Newsletter, vol.26, issue.4, pp.6-12, 1991.
DOI : 10.1145/122645.122648

X. Li and J. Demmel, SuperLU_DIST, ACM Transactions on Mathematical Software, vol.29, issue.2, pp.110-140, 2003.
DOI : 10.1145/779359.779361

P. Amestoy, J. Y. L-'excellent, F. H. Rouet, and M. Sid-lakhdar, Modeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver, High-Performance Computing for Computational Science, VECPAR 2014, 2014.
DOI : 10.1007/978-3-319-17353-5_14

URL : https://hal.archives-ouvertes.fr/hal-01355356

C. Fu, X. Jiao, and T. Yang, Efficient sparse LU factorization with partial pivoting on distributed memory architectures. Parallel and Distributed Systems, IEEE Transactions on, vol.9, issue.2, pp.109-125, 1998.

G. H. Golub and C. F. Van-loan, Matrix Computations, 1996.

J. Demmel, L. Grigori, M. F. Hoemmen, and J. Langou, Communicationoptimal parallel and sequential QR and LU factorizations Current version available in the ArXiv at http ://arxiv, 2008.

L. Grigori, J. Demmel, and H. Xiang, CALU: A Communication Optimal LU Factorization Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.4, pp.1317-1350, 2011.
DOI : 10.1137/100788926

URL : https://hal.archives-ouvertes.fr/hal-00651137

S. Donfack, L. Grigori, and A. K. Gupta, Adapting communication-avoiding LU and QR factorizations to multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010.
DOI : 10.1109/IPDPS.2010.5470348

L. Grigori, J. Demmel, and H. Xiang, Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.29, 2008.
DOI : 10.1109/SC.2008.5214287

URL : https://hal.archives-ouvertes.fr/inria-00277901

D. S. Parker, Random Butterfly Transformations with Applications in Computational Linear Algebra, 1995.

D. S. Parker and B. Pierce, The randomizing FFT : an aternative to pivoting in Gaussian elimination, 1995.

J. Neumann, First draft of a report on the EDVAC University of Pennsylvania, jun 1945 Report prepared for U.S. Army Ordinance Department under Contract W-670-ORD-4926

M. Flynn, Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972.
DOI : 10.1109/TC.1972.5009071

H. Sutter, The free lunch is over : A fundamental turn toward concurrency in software. Dr. Dobb's journal, pp.202-210, 2005.

W. Stallings, Computer Organization and Architecture -Designing for Performance (7, 2006.

B. I. Witt, M65MP, Proceedings of the 1968 23rd ACM national conference on -, pp.691-703, 1968.
DOI : 10.1145/800186.810634

J. L. Hennessy and D. A. Patterson, Computer Architecture, Fourth Edition : A Quantitative Approach, 2006.

C. Nvidia, Compute Unified Device Architecture programming guide, 2007.

G. Chrysos, Intel R Xeon Phi TM Coprocessor-the Architecture, Intel Whitepaper, 2014.

J. Fang, H. Sips, L. Zhang, C. Xu, Y. Che et al., Test-driving Intel Xeon Phi, Proceedings of the 5th ACM/SPEC international conference on Performance engineering, ICPE '14, pp.137-148
DOI : 10.1145/2568088.2576799

B. S. Garbow, EISPACK ??? A package of matrix eigensystem routines, Computer Physics Communications, vol.7, issue.4, pp.179-184, 1974.
DOI : 10.1016/0010-4655(74)90086-1

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic Linear Algebra Subprograms for Fortran Usage, ACM Transactions on Mathematical Software, vol.5, issue.3, pp.308-323, 1979.
DOI : 10.1145/355841.355847

A. Amd, Core Math Library (ACML) URL http

. Intel, Math Kernel Library (MKL) http://www.intel.com/software/ products

R. C. Whaley and J. Dongarra, Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, 1997.
DOI : 10.1109/SC.1998.10004

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

K. Goto and . Gotoblas, Texas Advanced Computing Center, 2007.

C. Nvidia, Cublas library. NVIDIA Corporation, 2008.

E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou et al., PLASMA users' guide, 2009.

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users guide : QUeueing And Runtime for Kernels, 2011.

R. Nath, S. Tomov, and J. Dongarra, An Improved Magma Gemm For Fermi Graphics Processing Units, The International Journal of High Performance Computing Applications, vol.27, issue.1, pp.511-515, 2010.
DOI : 10.1177/1094342010385729

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010.
DOI : 10.1016/j.parco.2009.12.005

S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Computing, vol.36, issue.12, pp.645-654, 2010.
DOI : 10.1016/j.parco.2010.06.001

J. Kurzak and J. J. Dongarra, Implementing linear algebra routines on multicore processors with pipelining and a look ahead, LAPACK Working Note, vol.178, 2006.

F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-755, 1997.
DOI : 10.1147/rd.416.0737

M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy et al., A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, International Conference on Computational Science, pp.17-26, 2012.
DOI : 10.1016/j.procs.2012.04.003

URL : https://hal.archives-ouvertes.fr/hal-00656457

M. Baboulin, J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, pp.2013-2029
DOI : 10.1145/2427023.2427025

URL : https://hal.archives-ouvertes.fr/inria-00593306

G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, 2011.
DOI : 10.1201/EBK1439811924

C. Lameter, Local and remote memory : Memory in a Linux/NUMA system, Linux Symposium (OLS2006), 2006.

A. Rémy, M. Baboulin, M. Sosonkina, and B. Rozoy, Locality optimization on a NUMA architecture for hybrid LU factorization, International Conference on Parallel Computing, pp.153-162, 2013.