M. Hybrid and ). Openmp, FMM with TD operators against the matrix approach, p.138

.. Fmm-sequential-algorithm, 84 9 FMM Operator using parallel-for (M2L), p.84

P. Parallel and .. With-color-scheme, 85 11 FMM tasks-and-wait Algorithm 86 12 FMM Section tasks-and-wait, Algorithm, vol.87

/. Send and M. Receive-in-distributed, 91 15 FMM -Communication hiding examples, p.92

Y. Liu, . Mukherjee, . Nishimura, . Schanz, . Ye et al., Recent Advances and Emerging Applications of the Boundary Element Method, Applied Mechanics Reviews, vol.64, issue.3, p.30802, 2011.
DOI : 10.1115/1.4005491
URL : https://hal.archives-ouvertes.fr/hal-01401752

I. Terrasse, Résolution mathématique et numérique des équations de Maxwell instationnaires par une méthode de potentiels retardés, 1993.

G. Sylvand, Équation des Ondes en Acoustique : Accélération des Potentiels Retardés par la Méthode Multipôle Temporelle, 2003.

G. Beer, I. Smith, and C. Duenser, The boundary element method with programming: for engineers and scientists, 2008.

O. Specifications, Openmp application programing interface. v3. 0, 2008.

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, p.12037, 2009.
DOI : 10.1088/1742-6596/180/1/012037

G. Field, E. Van-zee, . Chan, A. Robert, E. S. Van-de-geijn et al., The libflame library for dense matrix computations, Computing in science & engineering, vol.11, issue.6, pp.56-63, 2009.

X. Lacoste, M. Faverge, G. Bosilca, P. Ramet, and S. Thibault, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013.
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611

A. Murara?u, J. Weidendorfer, and A. Bode, Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation, Euro-Par 2011: Parallel Processing Workshops, pp.345-354, 2012.
DOI : 10.1007/978-3-642-29740-3_39

C. Yang, F. Wang, Y. Du, J. Chen, J. Liu et al., Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing, 2010 IEEE International Conference on Cluster Computing, pp.19-28, 2010.
DOI : 10.1109/CLUSTER.2010.12

G. E. , K. , R. M. Kirby, and I. , Parallel scientific computing in C++ and MPI: a seamless approach to parallel algorithms and their implementation, 2003.

J. Hardy-wilkinson, C. Reinsch, L. Friedrich, and . Bauer, Handbook for automatic computation: linear algebra, 1971.
DOI : 10.1007/978-3-642-86940-2

R. W. Vuduc, Automatic performance tuning of sparse matrix kernels, 2003.

E. Im, Optimizing the Performance of Sparse Matrix-Vector Multiplication, 2000.

B. James, I. White, and P. Sadayappan, On improving the performance of sparse matrix-vector multiplication, High-Performance Computing Proceedings. Fourth International Conference on, pp.66-71, 1997.

S. Toledo, Improving the memory-system performance of sparse-matrix vector multiplication, IBM Journal of Research and Development, vol.41, issue.6, pp.711-725, 1997.
DOI : 10.1147/rd.416.0711

A. Pinar, T. Michael, and . Heath, Improving performance of sparse matrix-vector multiplication, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '99, p.30, 1999.
DOI : 10.1145/331532.331562

E. Im, K. Yelick, and R. Vuduc, Sparsity: Optimization Framework for Sparse Matrix Kernels, International Journal of High Performance Computing Applications, vol.18, issue.1, pp.135-158, 2004.
DOI : 10.1177/1094342004041296

W. Richard, H. Vuduc, and . Moon, Fast sparse matrix-vector multiplication by exploiting variable block structure, High Performance Computing and Communications, pp.807-816, 2005.

Y. Saad, Sparskit: a basic tool kit for sparse matrix computations, 1994.

Y. Saad, Iterative methods for sparse linear systems second edition, 2003.

R. Vuduc, W. James, K. A. Demmel, and . Yelick, OSKI: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, p.521, 2005.
DOI : 10.1088/1742-6596/16/1/071

E. Im and K. Yelick, Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY, Computational Science?ICCS 2001, pp.127-136, 2001.
DOI : 10.1007/3-540-45545-0_22

E. Cuthill and J. Mckee, Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969.
DOI : 10.1145/800195.805928

J. C. Pichel, D. Blanco-heras, J. C. Cabaleiro, F. Francisco, and . Rivera, Performance optimization of irregular codes based on the combination of reordering and blocking techniques, Parallel Computing, vol.31, issue.8-9, pp.31858-876, 2005.
DOI : 10.1016/j.parco.2005.04.012

M. Martone, S. Filippone, S. Tucci, M. Paprzycki, and M. Ganzha, Utilizing recursive storage in sparse matrix-vector multiplication-preliminary considerations, CATA, pp.300-305, 2010.

A. Yzelman, Fast sparse matrix-vector multiplication by partitioning and reordering, 2011.

G. Haase, M. Liebmann, and G. Plank, A Hilbert-order multiplication scheme for unstructured sparse matrices, International Journal of Parallel, Emergent and Distributed Systems, vol.2625, issue.4, pp.213-220, 2007.
DOI : 10.1007/s006070070032

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick et al., Optimization of sparse matrix???vector multiplication on emerging multicore platforms, Parallel Computing, vol.35, issue.3, pp.178-194, 2009.
DOI : 10.1016/j.parco.2008.12.006

J. C. Pichel, D. Blanco-heras, J. C. Cabaleiro, F. Francisco, and . Rivera, Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures. Concurrency and Computation: Practice and Experience, pp.1838-1856, 2009.

B. Vastenhouw, H. Rob, and . Bisseling, A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication, SIAM Review, vol.47, issue.1, pp.67-95, 2005.
DOI : 10.1137/S0036144502409019

A. Buluç, T. Jeremy, M. Fineman, . Frigo, R. John et al., Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pp.233-244, 2009.

V. Karakasis, G. Goumas, and N. Koziris, A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures, 2009 International Conference on Computational Science and Engineering, pp.247-256, 2009.
DOI : 10.1109/CSE.2009.223

R. Muthu-manikandan-baskaran and . Bordawekar, Optimizing sparse matrix-vector multiplication on gpus using compile-time and run-time strategies, IBM Reserach Report, pp.24704-0812, 2008.

M. Garland, Sparse matrix computations on manycore GPU's, Proceedings of the 45th annual conference on Design automation, DAC '08, pp.2-6, 2008.
DOI : 10.1145/1391469.1391473

N. Bell and M. Garland, Efficient sparse matrix-vector multiplication on cuda, 2008.

N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.18, 2009.
DOI : 10.1145/1654059.1654078

A. Monakov, A. Lokhmotov, and A. Avetisyan, Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures, High Performance Embedded Architectures and Compilers, pp.111-125, 2010.
DOI : 10.1007/978-3-642-11515-8_10

D. Grewe and A. Lokhmotov, Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, p.12, 2011.
DOI : 10.1145/1964179.1964196

R. Vuduc, A. Chandramowlishwaran, J. Choi, M. Guney, and A. Shringarpure, On the limits of gpu acceleration, Proceedings of the 2nd USENIX conference on Hot topics in parallelism, pp.13-13, 2010.

C. Jhurani and P. Mullowney, A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices, Journal of Parallel and Distributed Computing, vol.75, pp.133-140, 2015.
DOI : 10.1016/j.jpdc.2014.09.003

V. Karakasis, G. Goumas, and N. Koziris, Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels, 2009 International Conference on Parallel Processing, pp.356-364, 2009.
DOI : 10.1109/ICPP.2009.21

R. Nishtala, W. Richard, . Vuduc, W. James, K. A. Demmel et al., When cache blocking of sparse matrix vector multiply works and why, Applicable Algebra in Engineering, Communication and Computing, vol.18, issue.3, pp.297-311, 2007.
DOI : 10.1007/s00200-007-0038-9

D. Guo and W. Gropp, Applications of the streamed storage format for sparse matrix operations, International Journal of High Performance Computing Applications, vol.28, issue.1, pp.3-12, 2014.
DOI : 10.1177/1094342012470469

A. Duran, J. M. Perez, R. M. Ayguadé, E. Badia, and J. Labarta, Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism, 4th International Workshop, pp.111-122, 2008.
DOI : 10.1007/978-3-540-79561-2_10

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures . Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, Computing in Science and Engineering, vol.99, issue.1, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00930217

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users' guide: QUeueing And Runtime for Kernels, 2011.

E. Chan, E. S. Quintana-orti, G. G. Quintana-orti, and R. Van-de-geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007.
DOI : 10.1145/1248377.1248397

G. Quintana-orti, E. S. Quintana-orti, E. Chan, R. A. Van-de-geijn, G. Field et al., Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), pp.301-310, 2008.
DOI : 10.1109/PDP.2008.37

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009.
DOI : 10.1145/1594835.1504196

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp.217-224, 2011.
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011.
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.
DOI : 10.1109/IPDPS.2011.299

G. Field, E. Van-zee, R. A. Chan, E. S. Van-de-geijn, G. Quintana-ortí et al., The libflame library for dense matrix computations, Computing in Science Engineering, vol.11, issue.6, pp.56-63, 2009.

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

A. Kawai, T. Fukushige, J. Makino, and M. Taiji, Grape-5: A special-purpose computer for n-body simulations. Publications of the Astronomical Society of Japan, pp.659-676, 2000.

W. Andrew and . Appel, An efficient program for many-body simulation, SIAM Journal on Scientific and Statistical Computing, vol.6, issue.1, pp.85-103, 1985.

K. Esselink, A comparison of algorithms for long-range interactions, Computer Physics Communications, vol.87, issue.3, pp.375-395, 1995.
DOI : 10.1016/0010-4655(95)00003-X

J. Barnes and P. Hut, A hierarchical O(N log N) force-calculation algorithm, Nature, vol.6, issue.6096, 1986.
DOI : 10.1038/324446a0

L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

A. Barry and . Cipra, The best of the 20th century: editors name top 10 algorithms, SIAM news, vol.33, issue.4, pp.1-2, 2000.

S. Michael, . Warren, K. John, and . Salmon, A parallel hashed oct-tree n-body algorithm, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pp.12-21, 1993.

M. Guy and . Morton, A computer oriented geodetic data base and a new technique in file sequencing, International Business Machines Company, 1966.

D. Hilbert, Ueber die stetige Abbildung einer Line auf ein Fl???chenst???ck, Mathematische Annalen, vol.38, issue.3, pp.459-460, 1891.
DOI : 10.1007/BF01199431

N. Reissman, J. C. Meyer, and M. Jahre, A Study of Energy and Locality Effects Using Space-Filling Curves, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.815-822, 2014.
DOI : 10.1109/IPDPSW.2014.93

B. Moon, V. Hosagrahar, C. Jagadish, J. H. Faloutsos, and . Saltz, Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering, IEEE Transactions on, vol.13, issue.1, pp.124-141, 2001.

M. Paul, . Campbell, D. Karen, . Devine, E. Joseph et al., Dynamic octree load balancing using space-filling curves, p.68, 2003.

H. Haverkort, An inventory of three-dimensional hilbert space-filling curves. arXiv preprint, 2011.

P. Fortin, Algorithmique hiérarchique parallèle haute performance pour les problèmes à N-corps, 2006.

S. Jamet, N. Rougemaille, J. Toussaint, and O. Fruchart, Head-to-head domain walls in one-dimensional nanostructures, 2014.
DOI : 10.1016/B978-0-08-100164-6.00025-4
URL : https://hal.archives-ouvertes.fr/hal-01090653

E. Arnaud, B. Pierre, D. Laurent, and O. Coulaud, OptiDis: a MPI/OpenMP Dislocation Dynamics Code for Large Scale Simulations, The 7th MMM International Conference on Multiscale Materials Modeling, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01086371

B. Pierre, A. Etcheverry, O. Coulaud, L. Dupuy, and M. Blétry, OptiDis: Toward fast anisotropic DD based on Stroh formalism. International Workshop on DD simulations, 2014.

M. Messner, B. Bramas, O. Coulaud, and E. Darve, Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method. ArXiv e-prints, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00746089

P. Blanchard, O. Coulaud, and E. Darve, Fast hierarchical algorithms for generating Gaussian random fields, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01228519

L. Ying, G. Biros, D. Zorin, and H. Langston, A new parallel kernelindependent fast multipole method, Supercomputing ACM/IEEE Conference, pp.14-14, 2003.

T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori et al., 42 tflops hierarchical n-body simulations on gpus with applications in both astrophysics and turbulence, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-62, 2009.

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros et al., Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010.
DOI : 10.1109/IPDPS.2010.5470415

Q. Hu, A. Nail, R. Gumerov, and . Duraiswami, Scalable fast multipole methods on distributed heterogeneous architectures, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-36, 2011.
DOI : 10.1145/2063384.2063432

R. Yokota, A. Lorena, and . Barba, Treecode and fast multipole method for n-body simulation with cuda. GPU Computing Gems Emerald Edition, p.113, 2011.

D. Malhotra and G. Biros, Abstract, Communications in Computational Physics, vol.42, issue.03, pp.808-830, 2015.
DOI : 10.1109/8.633855

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods. Concurrency and Computation: Practice and Experience, pp.1935-1946, 2014.

T. Abboud, C. Pallud, and . Teissedre, Sonate: a parallel code for acoustics nonlinear oscillations and boundary-value problems for hamiltonian systems, 1982.

Q. Fang and . Hu, An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by graphics processing units, 19th aiaa, CEAS AEROA- COUSTICS CONFERENCE, Chapter DOI, pp.6-2013, 2013.

T. Takahashi, An interpolation-based fast-multipole accelerated boundary integral equation method for the three-dimensional wave equation, Journal of Computational Physics, vol.258, pp.809-832, 2014.
DOI : 10.1016/j.jcp.2013.11.008

J. Tausch, A fast method for solving the heat equation by layer potentials, Journal of Computational Physics, vol.224, issue.2, pp.956-969, 2007.
DOI : 10.1016/j.jcp.2006.11.001

R. Patrick, . Amestoy, S. Iain, J. Duff, J. Koster et al., Mumps: A multifrontal massively parallel solver, ERCIM News, vol.50, pp.14-15, 2002.

A. Etcheverry, Vers la simulation en dynamique des dislocations à grande échelle, 2015.

A. Christopher, M. White, and . Head-gordon, Rotating around the quartic angular momentum barrier in fast multipole method calculations, The Journal of Chemical Physics, vol.105, issue.12, pp.5061-5067, 1996.

H. Dachsel, Fast and accurate determination of the Wigner rotation matrices in the fast multipole method, The Journal of Chemical Physics, vol.124, issue.14, p.144115, 2006.
DOI : 10.1063/1.2194548

A. A. Ergin, B. Shanker, and E. Michielssen, Fast analysis of transient acoustic wave scattering from rigid bodies using the multilevel plane wave time domain algorithm, The Journal of the Acoustical Society of America, vol.107, issue.3, 2000.
DOI : 10.1121/1.428406

G. Sylvand, La méthode multipôle rapide en électromagnétisme. Performances, parallélisation , applications, 2002.

E. Darve, Méthodes multipôles rapides: Résolution des équations de Maxwell par formulations intégrales

P. Havé and E. Darve, A fast multipole method for maxwell equations stable at all frequencies, Philosophical Transactions: Mathematical, Physical and Engineering Sciences, vol.362, pp.603-628, 1816.

E. Darve, The Fast Multipole Method: Numerical Implementation, Journal of Computational Physics, vol.160, issue.1, 2000.
DOI : 10.1006/jcph.2000.6451

A. Arif-ergin, B. Shanker, and E. Michielssen, Fast Evaluation of Three-Dimensional Transient Wave Fields Using Diagonal Translation Operators, Journal of Computational Physics, vol.146, issue.1, pp.157-180, 1998.
DOI : 10.1006/jcph.1998.5908

M. Frigo, G. Steven, and . Johnson, FFTW: an adaptive software architecture for the FFT, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.1381-1384, 1998.
DOI : 10.1109/ICASSP.1998.681704

D. Funaro, Polynomial approximation of differential equations, 1992.

N. Lloyd, D. Trefethen, and I. Bau, Numerical linear algebra, Siam, vol.50, 1997.

C. Steven, . Chapra, P. Raymond, and . Canale, Numerical methods for engineers, 2012.

H. William and . Press, Numerical recipes 3rd edition: The art of scientific computing, 2007.

D. Alexander and . Poularikas, Handbook of formulas and tables for signal processing, 1998.

. Intel, Intel Intrinsics Guide. https://software.intel.com/sites

O. Mishchenko, Optimizing cache behavior of ray-driven volume rendering using spacefilling curves, 2006.

A. Grama, Introduction to parallel computing. Pearson Education, 2003.