FMM with TD operators against the matrix approach, p.138 ,
84 9 FMM Operator using parallel-for (M2L), p.84 ,
85 11 FMM tasks-and-wait Algorithm 86 12 FMM Section tasks-and-wait, Algorithm, vol.87 ,
91 15 FMM -Communication hiding examples, p.92 ,
Recent Advances and Emerging Applications of the Boundary Element Method, Applied Mechanics Reviews, vol.64, issue.3, p.30802, 2011. ,
DOI : 10.1115/1.4005491
URL : https://hal.archives-ouvertes.fr/hal-01401752
Résolution mathématique et numérique des équations de Maxwell instationnaires par une méthode de potentiels retardés, 1993. ,
Équation des Ondes en Acoustique : Accélération des Potentiels Retardés par la Méthode Multipôle Temporelle, 2003. ,
The boundary element method with programming: for engineers and scientists, 2008. ,
Openmp application programing interface. v3. 0, 2008. ,
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
The libflame library for dense matrix computations, Computing in science & engineering, vol.11, issue.6, pp.56-63, 2009. ,
Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014. ,
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation, Euro-Par 2011: Parallel Processing Workshops, pp.345-354, 2012. ,
DOI : 10.1007/978-3-642-29740-3_39
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing, 2010 IEEE International Conference on Cluster Computing, pp.19-28, 2010. ,
DOI : 10.1109/CLUSTER.2010.12
Parallel scientific computing in C++ and MPI: a seamless approach to parallel algorithms and their implementation, 2003. ,
Handbook for automatic computation: linear algebra, 1971. ,
DOI : 10.1007/978-3-642-86940-2
Automatic performance tuning of sparse matrix kernels, 2003. ,
Optimizing the Performance of Sparse Matrix-Vector Multiplication, 2000. ,
On improving the performance of sparse matrix-vector multiplication, High-Performance Computing Proceedings. Fourth International Conference on, pp.66-71, 1997. ,
Improving the memory-system performance of sparse-matrix vector multiplication, IBM Journal of Research and Development, vol.41, issue.6, pp.711-725, 1997. ,
DOI : 10.1147/rd.416.0711
Improving performance of sparse matrix-vector multiplication, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '99, p.30, 1999. ,
DOI : 10.1145/331532.331562
Sparsity: Optimization Framework for Sparse Matrix Kernels, International Journal of High Performance Computing Applications, vol.18, issue.1, pp.135-158, 2004. ,
DOI : 10.1177/1094342004041296
Fast sparse matrix-vector multiplication by exploiting variable block structure, High Performance Computing and Communications, pp.807-816, 2005. ,
Sparskit: a basic tool kit for sparse matrix computations, 1994. ,
Iterative methods for sparse linear systems second edition, 2003. ,
OSKI: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, p.521, 2005. ,
DOI : 10.1088/1742-6596/16/1/071
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY, Computational Science?ICCS 2001, pp.127-136, 2001. ,
DOI : 10.1007/3-540-45545-0_22
Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969. ,
DOI : 10.1145/800195.805928
Performance optimization of irregular codes based on the combination of reordering and blocking techniques, Parallel Computing, vol.31, issue.8-9, pp.31858-876, 2005. ,
DOI : 10.1016/j.parco.2005.04.012
Utilizing recursive storage in sparse matrix-vector multiplication-preliminary considerations, CATA, pp.300-305, 2010. ,
Fast sparse matrix-vector multiplication by partitioning and reordering, 2011. ,
A Hilbert-order multiplication scheme for unstructured sparse matrices, International Journal of Parallel, Emergent and Distributed Systems, vol.2625, issue.4, pp.213-220, 2007. ,
DOI : 10.1007/s006070070032
Optimization of sparse matrix???vector multiplication on emerging multicore platforms, Parallel Computing, vol.35, issue.3, pp.178-194, 2009. ,
DOI : 10.1016/j.parco.2008.12.006
Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures. Concurrency and Computation: Practice and Experience, pp.1838-1856, 2009. ,
A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication, SIAM Review, vol.47, issue.1, pp.67-95, 2005. ,
DOI : 10.1137/S0036144502409019
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pp.233-244, 2009. ,
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures, 2009 International Conference on Computational Science and Engineering, pp.247-256, 2009. ,
DOI : 10.1109/CSE.2009.223
Optimizing sparse matrix-vector multiplication on gpus using compile-time and run-time strategies, IBM Reserach Report, pp.24704-0812, 2008. ,
Sparse matrix computations on manycore GPU's, Proceedings of the 45th annual conference on Design automation, DAC '08, pp.2-6, 2008. ,
DOI : 10.1145/1391469.1391473
Efficient sparse matrix-vector multiplication on cuda, 2008. ,
Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.18, 2009. ,
DOI : 10.1145/1654059.1654078
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures, High Performance Embedded Architectures and Compilers, pp.111-125, 2010. ,
DOI : 10.1007/978-3-642-11515-8_10
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, p.12, 2011. ,
DOI : 10.1145/1964179.1964196
On the limits of gpu acceleration, Proceedings of the 2nd USENIX conference on Hot topics in parallelism, pp.13-13, 2010. ,
A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices, Journal of Parallel and Distributed Computing, vol.75, pp.133-140, 2015. ,
DOI : 10.1016/j.jpdc.2014.09.003
Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels, 2009 International Conference on Parallel Processing, pp.356-364, 2009. ,
DOI : 10.1109/ICPP.2009.21
When cache blocking of sparse matrix vector multiply works and why, Applicable Algebra in Engineering, Communication and Computing, vol.18, issue.3, pp.297-311, 2007. ,
DOI : 10.1007/s00200-007-0038-9
Applications of the streamed storage format for sparse matrix operations, International Journal of High Performance Computing Applications, vol.28, issue.1, pp.3-12, 2014. ,
DOI : 10.1177/1094342012470469
Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism, 4th International Workshop, pp.111-122, 2008. ,
DOI : 10.1007/978-3-540-79561-2_10
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures . Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, Computing in Science and Engineering, vol.99, issue.1, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00930217
QUARK users' guide: QUeueing And Runtime for Kernels, 2011. ,
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007. ,
DOI : 10.1145/1248377.1248397
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), pp.301-310, 2008. ,
DOI : 10.1109/PDP.2008.37
Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009. ,
DOI : 10.1145/1594835.1504196
LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp.217-224, 2011. ,
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011. ,
DOI : 10.1109/IPDPS.2011.299
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
The libflame library for dense matrix computations, Computing in Science Engineering, vol.11, issue.6, pp.56-63, 2009. ,
Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014. ,
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
Grape-5: A special-purpose computer for n-body simulations. Publications of the Astronomical Society of Japan, pp.659-676, 2000. ,
An efficient program for many-body simulation, SIAM Journal on Scientific and Statistical Computing, vol.6, issue.1, pp.85-103, 1985. ,
A comparison of algorithms for long-range interactions, Computer Physics Communications, vol.87, issue.3, pp.375-395, 1995. ,
DOI : 10.1016/0010-4655(95)00003-X
A hierarchical O(N log N) force-calculation algorithm, Nature, vol.6, issue.6096, 1986. ,
DOI : 10.1038/324446a0
A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987. ,
DOI : 10.1016/0021-9991(87)90140-9
The best of the 20th century: editors name top 10 algorithms, SIAM news, vol.33, issue.4, pp.1-2, 2000. ,
A parallel hashed oct-tree n-body algorithm, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pp.12-21, 1993. ,
A computer oriented geodetic data base and a new technique in file sequencing, International Business Machines Company, 1966. ,
Ueber die stetige Abbildung einer Line auf ein Fl???chenst???ck, Mathematische Annalen, vol.38, issue.3, pp.459-460, 1891. ,
DOI : 10.1007/BF01199431
A Study of Energy and Locality Effects Using Space-Filling Curves, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.815-822, 2014. ,
DOI : 10.1109/IPDPSW.2014.93
Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering, IEEE Transactions on, vol.13, issue.1, pp.124-141, 2001. ,
Dynamic octree load balancing using space-filling curves, p.68, 2003. ,
An inventory of three-dimensional hilbert space-filling curves. arXiv preprint, 2011. ,
Algorithmique hiérarchique parallèle haute performance pour les problèmes à N-corps, 2006. ,
Head-to-head domain walls in one-dimensional nanostructures, 2014. ,
DOI : 10.1016/B978-0-08-100164-6.00025-4
URL : https://hal.archives-ouvertes.fr/hal-01090653
OptiDis: a MPI/OpenMP Dislocation Dynamics Code for Large Scale Simulations, The 7th MMM International Conference on Multiscale Materials Modeling, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01086371
OptiDis: Toward fast anisotropic DD based on Stroh formalism. International Workshop on DD simulations, 2014. ,
Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method. ArXiv e-prints, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00746089
Fast hierarchical algorithms for generating Gaussian random fields, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01228519
A new parallel kernelindependent fast multipole method, Supercomputing ACM/IEEE Conference, pp.14-14, 2003. ,
42 tflops hierarchical n-body simulations on gpus with applications in both astrophysics and turbulence, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-62, 2009. ,
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010. ,
DOI : 10.1109/IPDPS.2010.5470415
Scalable fast multipole methods on distributed heterogeneous architectures, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-36, 2011. ,
DOI : 10.1145/2063384.2063432
Treecode and fast multipole method for n-body simulation with cuda. GPU Computing Gems Emerald Edition, p.113, 2011. ,
Abstract, Communications in Computational Physics, vol.42, issue.03, pp.808-830, 2015. ,
DOI : 10.1109/8.633855
Data-driven execution of fast multipole methods. Concurrency and Computation: Practice and Experience, pp.1935-1946, 2014. ,
Sonate: a parallel code for acoustics nonlinear oscillations and boundary-value problems for hamiltonian systems, 1982. ,
An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by graphics processing units, 19th aiaa, CEAS AEROA- COUSTICS CONFERENCE, Chapter DOI, pp.6-2013, 2013. ,
An interpolation-based fast-multipole accelerated boundary integral equation method for the three-dimensional wave equation, Journal of Computational Physics, vol.258, pp.809-832, 2014. ,
DOI : 10.1016/j.jcp.2013.11.008
A fast method for solving the heat equation by layer potentials, Journal of Computational Physics, vol.224, issue.2, pp.956-969, 2007. ,
DOI : 10.1016/j.jcp.2006.11.001
Mumps: A multifrontal massively parallel solver, ERCIM News, vol.50, pp.14-15, 2002. ,
Vers la simulation en dynamique des dislocations à grande échelle, 2015. ,
Rotating around the quartic angular momentum barrier in fast multipole method calculations, The Journal of Chemical Physics, vol.105, issue.12, pp.5061-5067, 1996. ,
Fast and accurate determination of the Wigner rotation matrices in the fast multipole method, The Journal of Chemical Physics, vol.124, issue.14, p.144115, 2006. ,
DOI : 10.1063/1.2194548
Fast analysis of transient acoustic wave scattering from rigid bodies using the multilevel plane wave time domain algorithm, The Journal of the Acoustical Society of America, vol.107, issue.3, 2000. ,
DOI : 10.1121/1.428406
La méthode multipôle rapide en électromagnétisme. Performances, parallélisation , applications, 2002. ,
Méthodes multipôles rapides: Résolution des équations de Maxwell par formulations intégrales ,
A fast multipole method for maxwell equations stable at all frequencies, Philosophical Transactions: Mathematical, Physical and Engineering Sciences, vol.362, pp.603-628, 1816. ,
The Fast Multipole Method: Numerical Implementation, Journal of Computational Physics, vol.160, issue.1, 2000. ,
DOI : 10.1006/jcph.2000.6451
Fast Evaluation of Three-Dimensional Transient Wave Fields Using Diagonal Translation Operators, Journal of Computational Physics, vol.146, issue.1, pp.157-180, 1998. ,
DOI : 10.1006/jcph.1998.5908
FFTW: an adaptive software architecture for the FFT, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.1381-1384, 1998. ,
DOI : 10.1109/ICASSP.1998.681704
Polynomial approximation of differential equations, 1992. ,
Numerical linear algebra, Siam, vol.50, 1997. ,
Numerical methods for engineers, 2012. ,
Numerical recipes 3rd edition: The art of scientific computing, 2007. ,
Handbook of formulas and tables for signal processing, 1998. ,
Intel Intrinsics Guide. https://software.intel.com/sites ,
Optimizing cache behavior of ray-driven volume rendering using spacefilling curves, 2006. ,
Introduction to parallel computing. Pearson Education, 2003. ,