, Next, we added a expanded presentation of our basic SAMPI load balancing simulation workflow, presented at the beginning of Chapter 4, as well as the improved version of the workflow (Section 4.4), including spatial-aggregation (Section 4.2) and application-level rescaling

A. and H. , A simulator for large-scale parallel computer architectures, International Journal of Distributed Systems and Technologies, issue.2, pp.1947-3532

A. and H. , Finite difference simulations of seismic wave propagation for the 2007 mw 6.6 Niigata-ken Chuetsu-Oki earthquake: Validity of models and reliable input ground motion in the near-field. Pure and Applied Geophysics, pp.43-64, 2013.

B. , R. ;. Deelman, E. ;. Phan, and T. , Parallel simulation of large-scale parallel applications, International Journal of High Performance Computing Applications, issue.1, pp.3-12, 2001.

B. and P. , Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 4th Intl, p.13, 2013.

M. J. Berger and P. Colella, Local adaptive mesh refinement for shock hydrodynamics, Journal of Computational Physics, issue.1, pp.64-84, 1989.

B. , A. Kale, L. V. Kumar, and S. , Dynamic topology aware load balancing algorithms for molecular dynamics applications, Proceedings of the 23rd international Conference on Supercomputing, pp.110-116, 2009.

B. , R. D. Leiserson, and C. E. , Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748

B. , R. ;. Murshed, and M. M. , GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, pp.1175-1220, 2002.

S. ;. Böhm, C. Engelmann, and . Xsim, The extreme-scale simulator, 2011 International Conference on High Performance Computing Simulation, pp.280-286, 2011.

C. and H. , Simulation of MPI applications with time-independent traces, pp.1145-1168, 2015.

C. and H. , Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, issue.10, pp.2899-2917, 2014.

C. and U. V. , Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations, International Parallel and Distributed Processing Symposium (IPDPS), pp.1-11, 2007.

C. and P. , Single node on-line simulation of MPI applications with SMPI, International Parallel Distributed Processing Symposium (IPDPS), pp.664-675, 2011.

C. , F. ;. Tsogka, and C. , Application of the PML absorbing layer model to the linear elastodynamic problem in anisotropic heterogeneous media. Geophysics, v. 66, pp.294-307, 2001.

D. and F. , Assessing the Performance of MPI Applications Through Time-Independent Trace Replay, Second International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2011) Held in conjunction with ICPP 2011, the 40th International Conference on Parallel Processing, 2011.

D. , F. ;. Markomanolis, G. S. Suter, and F. , Improving the accuracy and efficiency of time-independent trace replay, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp.446-455, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00739082

D. and K. , Zoltan data management services for parallel dynamic applications, Computing in Science and Engineering, issue.2, pp.90-97, 2002.

D. and F. , Exploiting Intensive Multithreading for the Efficient Simulation of 3D Seismic Wave Propagation, 11th IEEE International Conference on, pp.253-260, 2008.

D. , F. ;. Do, H. ;. Aochi, and H. , On scalability issues of the elastodynamics equations on multicore platforms, International Conference on Computational Science, p.9, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00797682

D. and F. , High-performance finite-element simulations of seismic wave propagation in three-dimensional nonlinear inelastic geological media, Parallel Computing, pp.308-325, 2010.

D. and F. , Parallel simulations of seismic wave propagation on numa architectures, Parallel Computing: From Multicores and GPU's to Petascale, Proceedings of the conference ParCo, pp.67-74, 2009.

E. and C. , Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale, Future Generation Computer Systems, vol.30, pp.59-65, 2014.

E. , C. ;. Naughton, and T. , Improving the performance of the extreme-scale simulator, 2014 IEEE/ACM 18th International Symposium on Distributed Simulation and Real Time Applications, pp.198-207, 2014.

E. , C. ;. Naughton, and T. , A network contention model for the extreme-scale simulator, Proceedings of the 34 th IASTED International Conference on Modelling, Identification and Control (MIC) 2015. Innsbruck, 2015.

F. , M. ;. Madduri, K. ;. Raghavan, and P. , NUMA-aware graph mining techniques for performance and energy efficiency, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012.

F. , M. ;. Leiserson, C. E. Randall, and K. H. , The implementation of the Cilk-5 multithreaded language. SIGPLAN Not, vol.33, pp.212-223, 1998.

H. , T. ;. Schneider, T. ;. Lumsdaine, and A. , LogGOPSim: simulating large-scale applications in the loggops model, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp.597-604, 2010.

H. , C. ;. Lawlor, O. ;. Kalé, L. Adaptive, and M. , Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, pp.306-322, 2004.

H. and C. , Performance evaluation of Adaptive MPI, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.12-21, 2006.

F. ;. Ino, N. ;. Fujimoto, and K. Hagihara, Loggps: A parallel computational model for synchronization analysis, Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pp.133-142, 2001.

J. , I. S. Engelmann, and C. , Simulation of large-scale hpc architectures, 2011 40th International Conference on Parallel Processing Workshops, pp.447-456, 2011.

K. , L. ;. Krishnan, and S. , CHARM++: A Portable Concurrent Object Oriented System Based on C++, Proceedings of OOPSLA'93, pp.91-108, 1993.

K. , G. ;. Kumar, and V. Metis, Unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota, 1995.

K. , D. Martin, and R. , An unsplit convolutional perfectly matched layer improved at grazing incidence for the seismic wave equation. Geophysics, Society of Exploration Geophysicists, pp.155-167, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00528418

L. and J. Y. , Handbook of scheduling: algorithms, models, and performance analysis, /CRC computer and information science series, 2004.

M. , P. ;. Robertsson, J. O. Eisner, and L. , The finite-difference time-domain method for modeling of seismic wave propagation, Advances in Wave Propagation in Heterogenous Earth, pp.421-516, 2007.

N. and M. , NAMD-a Parallel, Object-Oriented Molecular Dynamics Program, International Journal of High Performance Computing Applications, issue.10, pp.251-268, 1996.

N. and M. , Scalatrace: Scalable compression and replay of communication traces for high-performance computing, Journal of Parallel and Distributed Computing, issue.8, pp.696-710, 2009.

, NS-3 CONSORTIUM. NS3 website, 2018.

P. and B. , Mpi-netsim: A network simulation module for mpi, Parallel and Distributed Systems (ICPADS), pp.464-471, 2009.

P. , M. ;. Carribault, P. ;. Jourdren, and H. , MPC-MPI: an MPI implementation reducing the overall memory consumption, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, pp.94-103, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00483994

P. and L. L. , Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems. Thesis (Thesis)-UFRGS, 2014.

P. and L. L. , A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems, Parallel Processing (ICPP), 2012 41st International Conference on, pp.118-127, 2012.

P. and L. L. , A Topology-Aware Load Balancing Algorithm for Clustered Hierarchical Multi-Core Machines, Future Generation Computer Systems, 2013.

P. , S. ;. Bagrodia, and R. L. Mpi-sim, Using parallel simulation to evaluate mpi programs, Proceedings of the 30th Conference on Winter Simulation, pp.467-474, 1998.

R. and R. , 2006 IEEE International Conference on Cluster Computing, pp.1-9, 2006.

R. , G. F. Henderson, T. R. Wehrle, K. ;. Güne¸sgüne¸-güne¸s, M. ;. Gross et al., The ns-3 network simulator modeling and tools for network simulation, pp.15-34, 2010.

R. and A. F. , The structural simulation toolkit. SIGMETRICS Performance Evaluation Review, 2011.

R. and E. R. , A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model. Computer Architecture and High Performance Computing, Symposium on, pp.71-78, 2010.

S. ;. Seguin, M. ;. Cracraft, and J. Drewniak, Static and quasi-dynamic load balancing in parallel FDTD codes for signal integrity, power integrity, and packaging applications, 2004 IEEE International Symposium on Electromagnetic Compatibility, pp.107-112, 2004.
DOI : 10.1109/isemc.2004.1350006

S. , S. S. Malony, and A. D. , The tau parallel performance system, Int. J. High Perform. Comput. Appl., v, vol.20, issue.2, pp.287-311, 2006.

T. and R. K. , Improving the performance of seismic wave simulations with dynamic load balancing, Parallel, Distributed and Network-Based Processing (PDP), pp.196-203, 2014.

T. and R. K. , Using simulation to evaluate and tune the performance of dynamic load balancing of an over-decomposed geophysics application, International European Conference on Parallel and Distributed Computing, pp.192-205, 2017.

T. and M. M. , PSINS: An open source event tracer and execution simulator for mpi applications, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.135-148, 2009.

V. and P. , On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations, ACM Transactions on Modeling and Computer Simulation, 2013.

Z. and G. , Simulating large scale parallel applications using statistical models for sequential execution blocks, Parallel and Distributed Systems (ICPADS)

, IEEE 16th International Conference on, pp.221-228, 2010.

Z. , G. ;. Kakulapati, G. ;. Kale, and L. , BigSim: a parallel simulator for performance prediction of extremely large parallel machines, Parallel and Distributed Processing Symposium, p.78, 2004.