A. Bilge-acun, N. Gupta, and . Jain, Parallel Programming with Migratable Objects: Charm++ in Practice, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, vol.117, p.61, 2014.

D. W. Allan, Time and Frequency (Time-Domain) Characterization, Estimation, and Prediction of Precision Clocks and Oscillators, Ferroelectrics, and Frequency Control, vol.34, issue.6, p.65, 1987.

J. Ansel, S. Kamil, and K. Veeramachaneni, OpenTuner, Proceedings of the 23rd international conference on Parallel architectures and compilation -PACT '14, p.31, 2014.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00384363

R. M. Badia, J. Labarta, J. Giménez, and F. Escalé, Dimemas: Predicting MPI Applications Behaviour in Grid Environments, Proc. of the Workshop on Grid Applications and Programming Tools, p.34, 2003.

D. Balouek, A. Carpen-amarie, and G. Charrier, Adding Virtualization Capabilities to the Grid'5000 Testbed, Cloud Computing and Services Science, vol.367, p.63, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00946971

P. Bedaride, A. Degomme, and S. Genaud, Toward Better Simulation of MPI Applications on Ethernet/TCP Networks, Proc. of the, p.4
URL : https://hal.archives-ouvertes.fr/hal-00919507

. Intl, Workshop on Performance Modeling, Benchmarking and Simulation, vol.8551, p.50, 2013.

M. Besta, S. Syed-minhaj-hassan, and . Yalamanchili, Slim NoC, Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems -ASPLOS '18, p.30, 2018.

M. Besta and T. Hoefler, Slim Fly: A Cost Effective Low-Diameter Network Topology, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, vol.139, p.29, 2014.

M. Bielert, F. M. Ciorba, K. Feldhoff, T. Ilsche, and W. E. Nagel, HAEC-SIM: A Simulation Framework for Highly Adaptive Energyefficient Computing Platforms, Proceedings of the 8th International Conference on Simulation Tools and Techniques (SIMUTools). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, p.34, 2015.

W. Bland, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Post-failure recovery of MPI communication capability, The International Journal of High Performance Computing Applications, vol.27, p.22, 2013.

J. Boudec, D. Mcdonald, and J. Mundinger, A Generic Mean Field Convergence Result for Systems of Interacting Objects, Fourth International Conference on the Quantitative Evaluation of Systems, 2007.

, IEEE, p.139, 2007.

L. Bobelin, A. Legrand, and D. Márquez, Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation, Proc. of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, vol.87, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00650233

G. Bosilca, A. Bouteiller, and A. Danalis, PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, p.31, 2013.

S. Boyd, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends® in Machine Learning, vol.3, p.122, 2010.

R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De-rose, and R. Buyya, CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience, vol.41, p.33, 2011.

F. Cappello, A. Geist, and W. Gropp, Toward Exascale Resilience: 2014 update, Supercomputing Frontiers and Innovations, vol.1, 2014.

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, p.38, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01017319

H. Casanova, A. Legrand, and Y. Robert, Parallel Algorithms, p.14, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00789466

T. Cornebize, C. Franz, A. Heinrich, J. Legrand, and . Vienne, Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01654804

P. Carribault, M. Pérache, and H. Jourdren, Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications, OpenMP in the Petascale Era, p.49, 2011.

S. Tomasz, U. Czajkowski, D. Aydonat, and . Denisenko, From opencl to high-performance hardware on FPGAS, 22nd International Conference on Field Programmable Logic and Applications (FPL), p.28, 2012.

J. D. Davis, S. Rivoire, M. Goldszmidt, and E. K. Ardestani, Including Variability in Large-Scale Cluster Power Models, IEEE Computer Architecture Letters, vol.11, issue.2, p.65, 2012.

M. Day, J. Bell, and R. Cheng, Cellular burning in lean premixed turbulent hydrogen-air flames: Coupling experimental and computational analysis at the laboratory scale, Journal of Physics: Conference Series, vol.180, p.21, 2009.

K. Dichev, K. Cameron, and D. S. Nikolopoulos, Energy-efficient localised rollback via data flow analysis and frequency scaling, Proceedings of the 25th European MPI Users' Group Meeting on -EuroMPI'18, p.140, 2018.

A. Degomme, A. Legrand, and G. S. Markomanolis, Simulating MPI Applications: The SMPI Approach, IEEE Transactions on Parallel and Distributed Systems, vol.28, pp.52-54, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01415484

J. Dongarra, P. Beckman, and T. Moore, The International Exascale Software Project roadmap, The International Journal of High Performance Computing Applications, vol.25, p.137, 2011.

J. Dongarra, S. Tomov, and P. Luszczek, With Extreme Computing, the Rules Have Changed, Computing in Science & Engineering, vol.19, issue.3, pp.52-62, 2017.

J. Dongarra, With Extreme Scale Computing the Rules Have Changed, Mathematical Software -ICMS, pp.3-6, 2016.

P. Dutot, M. Mercier, M. Poquet, and O. Richard, Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator, 20th Workshop on Job Scheduling Strategies for Parallel Processing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01333471

P. Dutot, M. Mercier, M. Poquet, and O. Richard, Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator, 20th Workshop on Job Scheduling Strategies for Parallel Processing, p.106, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01333471

M. Dayarathna, Y. Wen, and R. Fan, Data Center Energy Consumption Modeling: A Survey, IEEE Communications Surveys & Tutorials, vol.18, p.90, 2016.

C. Engelmann, Scaling To A Million Cores And Beyond: Using LightWeight Simulation to Understand The Challenges Ahead On The Road To Exascale, Future Generation Computer Systems, vol.30, p.34, 2014.

R. Fujimoto, Parallel and distributed simulation, 2015 Winter Simulation Conference (WSC), p.139, 2015.

E. Gabriel, G. E. Fagg, and G. Bosilca, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.97-104, 2004.

L. Genovese, A. Neelov, and S. Goedecker, Daubechies wavelets as a basis set for density functional pseudopotential calculations, The Journal of Chemical Physics, vol.129, issue.1, p.74, 2008.
URL : https://hal.archives-ouvertes.fr/hal-02194904

Y. Georgiou, D. Glesser, K. Rzadca, and D. Trystram, A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC, Proc. of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), p.106, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01230295

W. Gropp, MPICH2: A New Start for MPI Implementations, Recent Advances in Parallel Virtual Machine and Message Passing Interface, vol.48, p.15, 2002.

T. Guérout, T. Monteil, and G. D. Costa, Energy-aware simulation with DVFS, Simulation Modelling Practice and Theory, vol.39, p.33, 2013.

L. Guegan, A. Betsegaw-lemma-amersho, M. Orgerie, and . Quinson, A Large-Scale Wired Network Energy Model for Flow-Level Simulations, AINA 2019 -33rd International Conference on Advanced Information Networking and Applications, p.105, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02020045

D. Hackenberg, R. Oldenburg, D. Molka, and R. Schone, Introducing FIRESTARTER: A processor stress test utility, 2013 International Green Computing Conference Proceedings. IEEE, p.68, 2013.

F. C. Heinrich, A. Carpen-amarie, and A. Degomme, Predicting the Performance and the Power Consumption of MPI Applications With SimGrid". working paper or preprint, vol.89, p.63, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01446134

F. Heinrich, T. Cornebize, and A. Degomme, Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node, 2017 IEEE International Conference on Cluster Computing, pp.92-102, 2017.

M. Heusse, A. Sears, T. X. Merritt, A. Brown, and . Duda, Twoway TCP connections, ACM SIGCOMM Computer Communication Review, vol.41, p.43, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00930963

T. Hoefler, T. Schneider, and A. Lumsdaine, LogGOPSim: Simulating Large-scale Applications in the LogGOPS Model, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, p.34, 2010.

Y. Inadomi, T. Patki, and K. Inoue, Analyzing and Mitigating the Impact of Manufacturing Variability in Power-constrained Supercomputing, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '15, vol.95, p.65, 2015.

V. Jacobson, Congestion avoidance and control, Symposium proceedings on Communications architectures and protocols -SIGCOMM '88, p.121, 1988.

C. L. Janssen, H. Adalsteinsson, and S. Cranford, A Simulator for Large-scale Parallel Architectures, International Journal of Parallel and Distributed Systems, vol.1, issue.2, 2010.

D. Kliazovich, P. Bouvry, and S. U. Khan, A packet-level simulator of energy-aware cloud comuting data centers, Journal of Supercomputing, vol.62, p.33, 2012.

R. Keller and T. , A Simulation Workflow to Evaluate the Performance of Dynamic Load Balancing with Over-decomposition for Iterative Parallel Applications". Theses. Universidade Federal Do Rio Grande Do Sul, p.61, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01962082

J. Kim, J. Wiliam, S. Dally, D. Scott, and . Abts, Technology-Driven, Highly-Scalable Dragonfly Topology, 2008 International Symposium on Computer Architecture, vol.138, p.29, 2008.

V. Laxmikant, S. Kale, and . Krishnan, CHARM++". In: ACM SIGPLAN Notices, vol.28, p.31, 1993.

M. Koibuchi, H. Matsutani, D. F. Hideharu-amano, H. Hsu, and . Casanova, A case for random shortcut topologies for HPC interconnects, 39th Annual International Symposium on Computer Architecture (ISCA), 2012.

, IEEE, p.29, 2012.

M. Koibuchi, I. Fujiwara, H. Matsutani, and H. Casanova, Layout-conscious random topologies for HPC off-chip interconnects, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), p.29, 2013.

A. Lebre, A. Legrand, F. Suter, and P. Veyre, Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API, 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, p.37, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01197128

J. Lozi, B. Lepers, and J. Funston, The Linux scheduler, Proceedings of the Eleventh European Conference on Computer Systems -EuroSys '16, p.68, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01295194

J. Liu, J. Wu, and D. K. Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, International Journal of Parallel Programming, vol.32, issue.3, pp.167-198, 2004.

G. Marfia, C. Palazzi, and G. Pau, Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, NETWORKING 2007, p.43, 2007.

G. Markomanolis, Performance Evaluation and Prediction of Parallel Applications, Theses. Ecole normale supérieure, p.82, 2014.
URL : https://hal.archives-ouvertes.fr/tel-00951125

V. Moureau, P. Domingo, and L. Vervisch, From Large-Eddy Simulation to Direct Numerical Simulation of a lean premixed swirl flame: Filtered laminar flame-PDF modeling, Combustion and Flame, vol.158, issue.7, pp.1340-1357, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01672168

S. Miwa and H. Nakamura, Profile-based power shifting in interconnection networks with on/off links, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, vol.37, p.70, 2015.

M. Mubarak, C. D. Carothers, R. Ross, and P. Carns, Modeling a Million-Node Dragonfly Network Using Massively Parallel DiscreteEvent Simulation, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, p.139, 2012.

M. Mubarak, C. D. Carothers, R. B. Ross, and P. H. Carns, Enabling Parallel Simulation of Large-Scale HPC Network Systems, IEEE Transactions on Parallel and Distributed Systems, vol.28, p.34, 2017.

T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, Producing wrong data without doing anything obviously wrong!, Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.265-276, 2009.

M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. R. De-supinski, ScalaTrace: Scalable compression and replay of communication traces for high-performance computing, Journal of Parallel and Distributed Computing, vol.69, p.82, 2009.

T. Nowatzki, J. Menon, C. H. Ho, and K. Sankaralingam, Architectural Simulators Considered Harmful, IEEE Micro, vol.35, issue.6, p.33, 2015.

A. Orgerie, M. Dias-de-assunção, and L. Lefèvre, A Survey on Techniques for Improving the Energy Efficiency of Large-Scale Distributed Systems, ACM Computing Surveys (CSUR), vol.46, pp.89-91, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00767582

S. Ostermann, R. Prodan, and T. Fahringer, Dynamic Cloud Provisioning for Scientific Grid Workflows, Proc. of the 11th ACM/IEEE Intl. Conf. on Grid Computing (Grid), p.33, 2010.

L. Edson, M. Padoin, L. L. Castro, P. O. Pilla, J. Navaux et al., Saving energy by exploiting residual imbalances on iterative applications, 2014 21st International Conference on High Performance Computing (HiPC), 2014.

M. Quinson, C. Rosa, and C. Thiery, Parallel Simulation of Peer-to-Peer Systems, CCGrid 2012 -The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, p.139, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00602216

. R-core-team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, vol.72, p.71, 2016.

N. Rajovic, A. Rico, and F. Mantovani, The Mont-Blanc Prototype: An Alternative Approach for HPC Systems, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, p.25, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01354939

P. Renaud-goud, Energy-aware scheduling : complexity and algorithms". Theses. Ecole normale supérieure de lyon -ENS LYON, p.89, 2012.
URL : https://hal.archives-ouvertes.fr/tel-00744247

B. Rountree, D. K. Lownenthal, R. Bronis, and . De-supinski, Adagio, Proceedings of the 23rd international conference on Conference on Supercomputing -ICS '09, vol.3, p.129, 2009.

E. Schulte, D. Davison, T. Dye, and C. Dominik, A MultiLanguage Computing Environment for Literate Programming and Reproducible Research, Journal of Statistical Software, vol.46, issue.3, p.71, 2012.

H. Shoukourian, T. Wilde, D. Labrenz, and A. Bode, Using Machine Learning for Data Center Cooling Infrastructure Efficiency Prediction, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017.

A. Snavely, L. Carrington, and N. Wolter, A Framework for Performance Modeling and Prediction, Proc. of the ACM/IEEE Conference on Supercomputing, p.34, 2002.

L. Stanisic, A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications, p.58, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01248109

D. Terpstra, H. Jagode, H. You, and J. Dongarra, Collecting Performance Data with PAPI-C, Tools for High Performance Computing, p.59, 2009.

R. Keller-tesser, L. M. Schnorr, and A. Legrand, Performance modeling of a geophysics application to accelerate over-decomposition parameter tuning through simulation, Concurrency and Computation: Practice and Experience, p.5012, 2018.

M. Tighe, G. Keller, M. Bauer, and H. Lutfiyya, DCSim: a data centre simulation tool for evaluating dynamic virtualized resource management, Int. Conf. on Network and Service Management, p.33, 2012.

T. Sunao, I. Hitoshi, K. Yasuyuki, and M. Saitoh, Technologies and Future Prospects of Green Supercomputer ZettaScaler, C 100, p.27, 2017.

L. G. Valiant, A bridging model for parallel computation, Communications of the ACM 33, vol.8, p.107, 1990.

P. Velho, L. M. Schnorr, H. Casanova, and A. Legrand, On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations, In: ACM Trans. Model. Comput. Simul, vol.23, p.33, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00872476

P. Velho, L. Schnorr, H. Casanova, and A. Legrand, On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations, ACM Transactions on Modeling and Computer Simulation, vol.23, p.43, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00872476

J. Vienne, Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband, p.52, 2010.

T. Wilde, Assessing the Energy Efficiency of High Performance Computing (HPC) Data Centers, p.33, 2018.

R. , C. Whaley, A. Petitet, and J. J. Dongarra, Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, p.31, 2001.

G. Zheng, G. Kakulapati, and L. Kale, BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th IPDPS, p.34, 2004.

. Webpages,

. Dell and . Dell, OpenManage Deployment Toolkit Version 4.4 Command Line Interface Reference Guide, p.66, 2019.

D. Eadline, Network Co-design as a Gateway to Exascale, p.23, 2016.

. Mpi-forum and . Forum, , p.15, 2019.

J. Huffstetler, Intel Processors and FPGAs -Better Together, p.28, 2018.

. Kalray, Kalray announces the release of its third-generation MPPA® processor "Coolidge, p.27, 2017.

T. Kidd, , p.91, 2008.

. Nvidia, |. Cuda-zone, and . Nvidia-developer, , p.11, 2019.

. .. , Comparison of calibrated (blue) and uncalibrated (green) runs of LU with real experiments (red) and ideal scaling (grey), p.79

, As can be seen, there is signifcantly less jitter for local communications than for inter-node communications. We furthermore found that small messages were sent faster over network links than shared memory since the send operation was executed asynchronously for remote destinations. For large messages, the loopback was almost an order of magnitude faster which is expected due to the much faster bandwidth

W. .. , These measurements were repeated three times and no-earlier than 3 weeks after the previous measurement. Note that the y-axis begins at 85 W. The variation per core-count is therefore minimal, p.92

, Changing the frequency while keeping the load constant causes the consumed energy to grow quadratically. The energy that is required to keep an idling node on (despite all energy-efficiency measures, such as C-states), is additionally shown here as P idle

, Note that not all frequencies are shown in this figure to reduce overplotting. The energy that is required to keep an idling node on (despite all energy-efficiency measures, such as C-states), is shown here as P idle, Varying the number of cores while keeping the workload constant (NAS-EP, class C) reveals a linear connection between load and power consumption

, Power consumption over time when running NAS-EP, NAS-LU, HPL or idling (with 12 active cores and the frequency set to 2300 MHz), p.95

, Alternating between the highest frequency and the idle state (represented by frequency "0") consumes more power than running at a reduced frequency (with the same finishing time). The power data was obtained by actual measurements of NAS-LU

, Using Adagio with Ondes3D works for about 15 iterations, after which even the most loaded host taurus-16 starts to slow down, p.118

, 119 10.1 On the Stampede supercomputer, one "slow" and one "fast" mode seem to exist for MPI_Recv and MPI_Send (depicted) calls, making faithful simulation very difficult