E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009.
DOI : 10.1088/1742-6596/180/1/012037

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., LAPACK: a portable linear algebra library for highperformance computers, Proceedings Supercomputing '90, pp.2-11, 1990.

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz, Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12
DOI : 10.1145/2312005.2312044

R. Barbulescu, C. Bouvier, J. Detrey, P. Gaudry, H. Jeljeli et al., Discrete Logarithm in GF(2809) with FFS, Public-Key Cryptography -PKC 2014 -17th International Conference on Practice and Theory in Public-Key Cryptography, pp.221-238, 2014.
DOI : 10.1007/978-3-642-54631-0_13

URL : https://hal.archives-ouvertes.fr/hal-00818124

A. R. Benson and G. Ballard, A framework for practical parallel fast matrix multiplication, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp.42-53, 2015.
DOI : 10.1145/2688500.2688513

URL : http://arxiv.org/pdf/1409.2908

L. S. Blackford, J. Choi, A. Cleary, E. D. Azevedo, J. Demmel et al., ScaLAPACK Users' Guide, 1997.
DOI : 10.1137/1.9780898719642

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: Efficient Multithreaded Computing, 1996.
DOI : 10.1006/jpdc.1996.0107

O. A. Board, OpenMP Application Program Interface version 3, 2008.

O. A. Board, OpenMP Application Program Interface version 4, 2013.

W. Bosma, J. J. Cannon, and C. Playoust, The Magma Algebra System I: The User Language, Journal of Symbolic Computation, vol.24, issue.3-4, pp.235-265, 1996.
DOI : 10.1006/jsco.1996.0125

W. Bosma, J. Cannon, and C. Playoust, The Magma algebra system. I. The user language Computational algebra and number theory, J. Symbolic Comput, vol.24, pp.3-4, 1993.

N. Bourbaki, Groupes et Algègres de Lie. Elements of mathematics Chapters 4?6, 2008.

B. Boyer, A. Breust, J. Dumas, P. Giorgi, C. Pernet et al., FFLAS- FFPACK: Finite Field Linear Algebra Subroutines

B. Boyer, J. Dumas, C. Pernet, P. Giorgi, and B. D. Saunders, LinBox-1.4.0: Exact Computational Linear Algebra. url: https

B. Boyer, J. Dumas, and P. Giorgi, Exact sparse matrix-vector multiplication on GPU's and multicore architectures, Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, PASCO '10, pp.80-88, 2010.
DOI : 10.1145/1837210.1837224

URL : http://hal.archives-ouvertes.fr/docs/00/47/51/85/PDF/ffspmv.pdf

B. Boyer, J. Dumas, C. Pernet, and W. Zhou, Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm, Proceedings of the 2009 international symposium on Symbolic and algebraic computation, ISSAC '09, pp.55-62, 2009.
DOI : 10.1145/1576702.1576713

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, OpenMP in a Heterogeneous World -8th International Workshop on OpenMP, IWOMP 2012, pp.102-115, 2012.
DOI : 10.1007/978-3-642-30961-8_8

URL : https://hal.archives-ouvertes.fr/hal-00796253

F. Bruhat, Sur les repr??sentations induites des groupes de Lie, Bulletin de la Société mathématique de France, vol.79, issue.86911, pp.97-205, 1956.
DOI : 10.24033/bsmf.1469

J. R. Bunch and J. E. Hopcroft, Triangular factorization and inversion by fast matrix multiplication, In: Mathematics of Computation, vol.28125, pp.231-236, 1974.
DOI : 10.21236/ad0754790

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

Z. Chen and A. Storjohann, A BLAS based C library for exact linear algebra on integer matrices, Proceedings of the 2005 international symposium on Symbolic and algebraic computation , ISSAC '05, pp.92-99, 2005.
DOI : 10.1145/1073884.1073899

F. Dahlgren and J. Torrellas, Cache-only memory architectures, Computer, vol.32, issue.6, pp.72-79, 1999.
DOI : 10.1109/2.769448

URL : http://iacoma.cs.uiuc.edu/iacoma-papers/encyclopedia_coma.pdf

P. , M. Bodrato, and A. Nicolau, Exploiting Parallelism in Matrix-computation Kernels for Symmetric Multiprocessor Systems: Matrix-multiplication and Matrix-addition Algorithm Optimizations by Software Pipelining and Threads Allocation, In: ACM Trans. Math. Softw, vol.38, issue.2, pp.1-2, 2011.

J. and D. Dora, Sur quelques algorithmes de recherche de valeurs propres, 1973.
URL : https://hal.archives-ouvertes.fr/tel-00010274

S. Donfack, J. Dongarra, M. Faverge, M. Gates, J. Kurzak et al., A survey of recent developments in parallel implementations of Gaussian elimination, Concurrency and Computation: Practice and Experience, pp.1292-1309, 2015.
DOI : 10.1109/CLUSTR.2007.4629221

URL : https://hal.archives-ouvertes.fr/hal-00986948

J. J. Dongarra, L. S. Duff, D. C. Sorensen, and H. A. Vorst, Numerical Linear Algebra for High-Performance Computers, 1998.
DOI : 10.1137/1.9780898719611

J. J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization, Concurrency and Computation: Practice and Experience, vol.267, 2014.
DOI : 10.1002/cpe.3110

URL : https://hal.archives-ouvertes.fr/hal-00865472

C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm, Journal of Computational Physics, vol.110, issue.1, pp.1-10, 1994.
DOI : 10.1006/jcph.1994.1001

I. S. Duff, A. M. Erisman, and J. K. Reid, Direct methods for sparse matrices, 1986.
DOI : 10.1093/acprof:oso/9780198508380.001.0001

J. Dumas, L. Fousse, and B. Salvy, Simultaneous modular reduction and Kronecker substitution for small finite fields Special Issue in Honour of Keith Geddes on his 60th Birthday, Journal of Symbolic Computation, vol.467, pp.823-840, 2011.

J. Dumas, T. Gautier, P. Giorgi, J. Roch, and G. Villard, Givaro-4.0.1: une bibliothèque C++ pour le Calcul Formel. Software IMAG-LMC, 2004.

J. Dumas, T. Gautier, C. Pernet, J. Roch, and Z. Sultan, Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination, Parallel Computing, vol.57, 2015.
DOI : 10.1016/j.parco.2015.10.003

URL : https://hal.archives-ouvertes.fr/hal-01084238

J. Dumas, T. Gautier, C. Pernet, and Z. Sultan, Parallel Computation of Echelon Forms, Euro-Par 2014 Parallel Processing -20th International Conference, pp.499-510, 2014.
DOI : 10.1007/978-3-319-09873-9_42

URL : https://hal.archives-ouvertes.fr/hal-00947013

J. Dumas, P. Giorgi, and C. Pernet, FFPACK, Proceedings of the 2004 international symposium on Symbolic and algebraic computation , ISSAC '04, pp.119-126, 2004.
DOI : 10.1145/1005285.1005304

URL : https://hal.archives-ouvertes.fr/hal-00018223

J. Dumas, P. Giorgi, and C. Pernet, Dense Linear Algebra over Word-Size Prime Fields, ACM Transactions on Mathematical Software, vol.35, issue.3, pp.1-19, 2008.
DOI : 10.1145/1391989.1391992

URL : https://hal.archives-ouvertes.fr/hal-00018223

J. Dumas and C. Pernet, Computational linear algebra over finite fields In: Handbook of Finite Fields, 2013.

J. Dumas, C. Pernet, J. Decker, M. Dewar, E. Kaltofen et al., Adaptive Triangular System Solving In: Challenges in Symbolic Computation Software, Dagstuhl Seminar Proceedings 06271. Dagstuhl, Germany: Internationales Begegnungsund Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, 2006.

J. Dumas, C. Pernet, and Z. Sultan, Simultaneous computation of the row and column rank profiles, Proceedings of the 38th international symposium on International symposium on symbolic and algebraic computation, ISSAC '13, 2013.
DOI : 10.1145/2465506.2465517

URL : https://hal.archives-ouvertes.fr/hal-00778136

J. Dumas, C. Pernet, and Z. Sultan, Computing the Rank Profile Matrix, Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '15, pp.149-156, 2015.
DOI : 10.1007/978-3-642-15274-0_16

URL : https://hal.archives-ouvertes.fr/hal-01107722

J. Dumas, C. Pernet, and Z. Wan, Efficient computation of the characteristic polynomial, Proceedings of the 2005 international symposium on Symbolic and algebraic computation , ISSAC '05, pp.140-147, 2005.
DOI : 10.1145/1073884.1073905

URL : https://hal.archives-ouvertes.fr/hal-00004056

J. Dumas and J. Roch, On parallel block algorithms for exact triangularizations, Parallel Computing, vol.28, issue.11, pp.1531-1548, 2002.
DOI : 10.1016/S0167-8191(02)00161-8

URL : http://www-id.imag.fr/~jgdumas/Publications/Turbo.ps.gz

J. Faugère, A new efficient algorithm for computing Gröbner bases (F4), Journal of Pure and Applied Algebra, vol.139, issue.99, pp.1-3, 1999.

J. V. Gathen and J. Gerhard, Modern Computer Algebra, 1999.

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.27-28, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00727795

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1299-1308, 2013.
DOI : 10.1109/IPDPS.2013.66

URL : https://hal.archives-ouvertes.fr/hal-00799904

T. Gautier, J. Roch, Z. Sultan, and B. Vialla, Parallel algebraic linear algebra dedicated interface, Proceedings of the 2015 International Workshop on Parallel Symbolic Computation, PASCO '15, pp.34-43, 2015.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-01221106

K. Goto and R. A. Geijn, Anatomy of high-performance matrix multiplication, ACM Transactions on Mathematical Software, vol.34, issue.3, pp.1-12, 2008.
DOI : 10.1145/1356052.1356053

URL : http://www.cs.utexas.edu/users/flame/pubs/GOTO_TOMS.ps

D. Y. Grigor-'ev, Analogy of Bruhat decomposition for the closure of a cone of Chevalley group of a classical serie, In: Soviet Mathematics Doklady, vol.23, issue.2, pp.393-397, 1981.

D. Y. Grigor-'ev, Additive complexity in directed computations, In: Theoretical Computer Science, vol.1982, pp.39-670304, 1982.

L. Grigori, J. W. Demmel, and H. Xiang, CALU: A Communication Optimal LU Factorization Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.4, pp.1317-1350, 2011.
DOI : 10.1137/100788926

URL : https://hal.archives-ouvertes.fr/hal-00651137

F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997.
DOI : 10.1147/rd.416.0737

F. G. Gustavson, A. Henriksson, I. Jonsson, B. Kagstrom, and P. Ling, Recursive blocked data formats and BLAS???s for dense linear algebra algorithms, In: PARA. Ed. by B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski. Lecture Notes in Computer Science, vol.1541, pp.195-206, 1998.
DOI : 10.1007/BFb0095337

W. Hart, F. Johansson, and S. Pancratz, FLINT: Fast Library for Number Theory
DOI : 10.1007/978-3-642-15582-6_18

URL : http://wrap.warwick.ac.uk/41629/1/WRAP_Hart_0584144-ma-270913-flint-extended-abstract.pdf

M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu et al., An overview of the Trilinos project, ACM Transactions on Mathematical Software, vol.31, issue.3, pp.397-423, 2005.
DOI : 10.1145/1089014.1089021

O. H. Ibarra, S. Moran, and R. Hui, A generalization of the fast LUP matrix decomposition algorithm and applications, Journal of Algorithms, vol.3, issue.1, pp.45-56, 1982.
DOI : 10.1016/0196-6774(82)90007-4

. Intel, Intel Math Kernel Library, 2007.

C. Jeannerod, C. Pernet, and A. Storjohann, Rank-profile revealing Gaussian elimination and the CUP matrix decomposition, Journal of Symbolic Computation, vol.56, 2013.
DOI : 10.1016/j.jsc.2013.04.004

URL : https://hal.archives-ouvertes.fr/hal-00655543

D. J. Jeffrey, LU factoring of non-invertible matrices, ACM SIGSAM Bulletin, vol.44, issue.1/2, pp.1-8, 2010.
DOI : 10.1145/1838599.1838602

J. Jelinek, The GNU OpenMP implementation. 2014. url: https

W. Keller-gehrig, Fast algorithms for the characteristics polynomial, Theoretical Computer Science, vol.36, pp.309-317, 1985.
DOI : 10.1016/0304-3975(85)90049-0

T. Kelleyc, Iterative MethodsforLinearand Nonlinear Equations, 1995.

K. Klimkowski and R. A. Van-de-geijn, Anatomy of a Parallel Out-of-Core Dense Linear Solver, In: ICPP, vol.3, pp.29-33, 1995.

B. Kumar, C. Huang, R. Johnson, and P. Sadayappan, A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction, Parallel Processing Symposium Proceedings of Seventh International, pp.582-588, 1993.

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, pp.15-44, 2010.
DOI : 10.1137/1.9781611971446

J. Kurzak, P. Luszczek, A. Yarkhan, M. Faverge, J. Langou et al., Multithreading in the PLASMA Library, Multicore Computing: Algorithms, Architectures, and Applications, p.119, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00809774

F. and L. Gall, Powers of tensors and fast matrix multiplication, Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pp.296-303, 2014.
DOI : 10.1145/2608628.2608664

G. I. Malaschonok, Fast Generalized Bruhat Decomposition, CASC'10, pp.194-202, 2010.
DOI : 10.1109/SFCS.1992.267779

W. Manthey and U. Helmke, Bruhat canonical form for linear systems, Linear Algebra and its Applications, vol.425, issue.2-3, pp.2-3, 2007.
DOI : 10.1016/j.laa.2007.01.022

E. Miller and B. Sturmfels, Combinatorial commutative algebra, 2005.

A. Novocin, D. Stehlé, and G. Villard, An LLL-reduction algorithm with quasi-linear time complexity, Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC '11, pp.403-412, 2011.
DOI : 10.1145/1993636.1993691

URL : https://hal.archives-ouvertes.fr/ensl-00534899

J. Poulson, B. Marker, R. A. Van-de-geijn, J. R. Hammond, and N. A. Romero, Elemental, ACM Transactions on Mathematical Software, vol.39, issue.2, pp.1-1324, 2013.
DOI : 10.1145/2427023.2427030

V. Shoup, A library for Number Theory

J. G. Siek and A. Lumsdaine, The matrix template library: A generic programming approach to high performance numerical linear algebra " . In: Computing in Object-Oriented Parallel Environments, pp.59-70, 1998.

W. Stein, Modular forms, a computational approach Graduate studies in mathematics, 2007.

A. Storjohann, Algorithms for Matrix Canonical Forms, pp.10-3929, 2000.

V. Strassen, Gaussian elimination is not optimal, Numerische Mathematik, vol.13, issue.4, pp.354-356, 1969.
DOI : 10.1007/BF02165411

S. Toledo, Locality of Reference in LU Decomposition with Partial Pivoting, SIAM Journal on Matrix Analysis and Applications, vol.18, issue.4, pp.1065-1081, 1997.
DOI : 10.1137/S0895479896297744

G. Villard, Calcul formel et parallélisme: résolution de systèmes linéaires, 1988.

E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu et al., Intel Math Kernel Library, High-Performance Computing on the Intel R Xeon Phi, pp.167-188, 2014.
DOI : 10.1007/978-3-319-06486-4_7

Q. Wang, X. Zhang, Y. Zhang, and Q. Yi, AUGEM, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-25, 2013.
DOI : 10.1145/2503210.2503219

R. C. Whaley, A. Petitet, and J. J. Dongarra, Automated empirical optimizations of software and the {ATLAS} project New Trends in High Performance Computing, Parallel Computing, vol.271, issue.200, pp.3-35, 2001.

V. V. Williams, Multiplying matrices faster than coppersmith-winograd, Proceedings of the 44th symposium on Theory of Computing, STOC '12, pp.887-898, 2012.
DOI : 10.1145/2213977.2214056

S. Winograd, On multiplication of 2 ?? 2 matrices, Linear Algebra and its Applications, vol.4, issue.4, pp.381-388, 1971.
DOI : 10.1016/0024-3795(71)90009-7

Z. Xianyi, W. Qian, and Z. Yunquan, Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp.684-691, 2012.
DOI : 10.1109/ICPADS.2012.97

C. X. , D. Stehlé, and G. Villard, Perturbation analysis of the QR Factor R in the context of LLL lattice basis reduction In: () doi: http://dx.doi.org/10.1090/S0025-5718-2012- 02545-2. References References 41 READWRITE ( An, 43 References References 38 TASK ( MODE ( READ ( Q1 ) CONSTREFERENCE ( Fi , Q1 , A3 ) READWRITE ( A3, pp.42-44