Parallel Programming with Migratable Objects: Charm++ in Practice, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, vol.117, p.61, 2014. ,
Time and Frequency (Time-Domain) Characterization, Estimation, and Prediction of Precision Clocks and Oscillators, Ferroelectrics, and Frequency Control, vol.34, issue.6, p.65, 1987. ,
OpenTuner, Proceedings of the 23rd international conference on Parallel architectures and compilation -PACT '14, p.31, 2014. ,
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Dimemas: Predicting MPI Applications Behaviour in Grid Environments, Proc. of the Workshop on Grid Applications and Programming Tools, p.34, 2003. ,
Adding Virtualization Capabilities to the Grid'5000 Testbed, Cloud Computing and Services Science, vol.367, p.63, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00946971
Toward Better Simulation of MPI Applications on Ethernet/TCP Networks, Proc. of the, p.4 ,
URL : https://hal.archives-ouvertes.fr/hal-00919507
, Workshop on Performance Modeling, Benchmarking and Simulation, vol.8551, p.50, 2013.
Slim NoC, Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems -ASPLOS '18, p.30, 2018. ,
Slim Fly: A Cost Effective Low-Diameter Network Topology, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, vol.139, p.29, 2014. ,
HAEC-SIM: A Simulation Framework for Highly Adaptive Energyefficient Computing Platforms, Proceedings of the 8th International Conference on Simulation Tools and Techniques (SIMUTools). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, p.34, 2015. ,
Post-failure recovery of MPI communication capability, The International Journal of High Performance Computing Applications, vol.27, p.22, 2013. ,
A Generic Mean Field Convergence Result for Systems of Interacting Objects, Fourth International Conference on the Quantitative Evaluation of Systems, 2007. ,
, IEEE, p.139, 2007.
Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation, Proc. of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, vol.87, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00650233
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, p.31, 2013. ,
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends® in Machine Learning, vol.3, p.122, 2010. ,
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience, vol.41, p.33, 2011. ,
Toward Exascale Resilience: 2014 update, Supercomputing Frontiers and Innovations, vol.1, 2014. ,
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, p.38, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01017319
Parallel Algorithms, p.14, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00789466
Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01654804
Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications, OpenMP in the Petascale Era, p.49, 2011. ,
From opencl to high-performance hardware on FPGAS, 22nd International Conference on Field Programmable Logic and Applications (FPL), p.28, 2012. ,
Including Variability in Large-Scale Cluster Power Models, IEEE Computer Architecture Letters, vol.11, issue.2, p.65, 2012. ,
Cellular burning in lean premixed turbulent hydrogen-air flames: Coupling experimental and computational analysis at the laboratory scale, Journal of Physics: Conference Series, vol.180, p.21, 2009. ,
Energy-efficient localised rollback via data flow analysis and frequency scaling, Proceedings of the 25th European MPI Users' Group Meeting on -EuroMPI'18, p.140, 2018. ,
Simulating MPI Applications: The SMPI Approach, IEEE Transactions on Parallel and Distributed Systems, vol.28, pp.52-54, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01415484
The International Exascale Software Project roadmap, The International Journal of High Performance Computing Applications, vol.25, p.137, 2011. ,
With Extreme Computing, the Rules Have Changed, Computing in Science & Engineering, vol.19, issue.3, pp.52-62, 2017. ,
With Extreme Scale Computing the Rules Have Changed, Mathematical Software -ICMS, pp.3-6, 2016. ,
Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator, 20th Workshop on Job Scheduling Strategies for Parallel Processing, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01333471
Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator, 20th Workshop on Job Scheduling Strategies for Parallel Processing, p.106, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01333471
Data Center Energy Consumption Modeling: A Survey, IEEE Communications Surveys & Tutorials, vol.18, p.90, 2016. ,
Scaling To A Million Cores And Beyond: Using LightWeight Simulation to Understand The Challenges Ahead On The Road To Exascale, Future Generation Computer Systems, vol.30, p.34, 2014. ,
Parallel and distributed simulation, 2015 Winter Simulation Conference (WSC), p.139, 2015. ,
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.97-104, 2004. ,
Daubechies wavelets as a basis set for density functional pseudopotential calculations, The Journal of Chemical Physics, vol.129, issue.1, p.74, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-02194904
A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC, Proc. of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), p.106, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01230295
MPICH2: A New Start for MPI Implementations, Recent Advances in Parallel Virtual Machine and Message Passing Interface, vol.48, p.15, 2002. ,
Energy-aware simulation with DVFS, Simulation Modelling Practice and Theory, vol.39, p.33, 2013. ,
A Large-Scale Wired Network Energy Model for Flow-Level Simulations, AINA 2019 -33rd International Conference on Advanced Information Networking and Applications, p.105, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02020045
Introducing FIRESTARTER: A processor stress test utility, 2013 International Green Computing Conference Proceedings. IEEE, p.68, 2013. ,
Predicting the Performance and the Power Consumption of MPI Applications With SimGrid". working paper or preprint, vol.89, p.63, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01446134
Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node, 2017 IEEE International Conference on Cluster Computing, pp.92-102, 2017. ,
Twoway TCP connections, ACM SIGCOMM Computer Communication Review, vol.41, p.43, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00930963
LogGOPSim: Simulating Large-scale Applications in the LogGOPS Model, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, p.34, 2010. ,
Analyzing and Mitigating the Impact of Manufacturing Variability in Power-constrained Supercomputing, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '15, vol.95, p.65, 2015. ,
Congestion avoidance and control, Symposium proceedings on Communications architectures and protocols -SIGCOMM '88, p.121, 1988. ,
A Simulator for Large-scale Parallel Architectures, International Journal of Parallel and Distributed Systems, vol.1, issue.2, 2010. ,
A packet-level simulator of energy-aware cloud comuting data centers, Journal of Supercomputing, vol.62, p.33, 2012. ,
A Simulation Workflow to Evaluate the Performance of Dynamic Load Balancing with Over-decomposition for Iterative Parallel Applications". Theses. Universidade Federal Do Rio Grande Do Sul, p.61, 2018. ,
URL : https://hal.archives-ouvertes.fr/tel-01962082
Technology-Driven, Highly-Scalable Dragonfly Topology, 2008 International Symposium on Computer Architecture, vol.138, p.29, 2008. ,
, CHARM++". In: ACM SIGPLAN Notices, vol.28, p.31, 1993.
A case for random shortcut topologies for HPC interconnects, 39th Annual International Symposium on Computer Architecture (ISCA), 2012. ,
, IEEE, p.29, 2012.
Layout-conscious random topologies for HPC off-chip interconnects, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), p.29, 2013. ,
Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API, 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, p.37, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01197128
The Linux scheduler, Proceedings of the Eleventh European Conference on Computer Systems -EuroSys '16, p.68, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01295194
High Performance RDMA-Based MPI Implementation over InfiniBand, International Journal of Parallel Programming, vol.32, issue.3, pp.167-198, 2004. ,
Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, NETWORKING 2007, p.43, 2007. ,
Performance Evaluation and Prediction of Parallel Applications, Theses. Ecole normale supérieure, p.82, 2014. ,
URL : https://hal.archives-ouvertes.fr/tel-00951125
From Large-Eddy Simulation to Direct Numerical Simulation of a lean premixed swirl flame: Filtered laminar flame-PDF modeling, Combustion and Flame, vol.158, issue.7, pp.1340-1357, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01672168
Profile-based power shifting in interconnection networks with on/off links, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, vol.37, p.70, 2015. ,
Modeling a Million-Node Dragonfly Network Using Massively Parallel DiscreteEvent Simulation, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, p.139, 2012. ,
Enabling Parallel Simulation of Large-Scale HPC Network Systems, IEEE Transactions on Parallel and Distributed Systems, vol.28, p.34, 2017. ,
Producing wrong data without doing anything obviously wrong!, Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.265-276, 2009. ,
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing, Journal of Parallel and Distributed Computing, vol.69, p.82, 2009. ,
Architectural Simulators Considered Harmful, IEEE Micro, vol.35, issue.6, p.33, 2015. ,
A Survey on Techniques for Improving the Energy Efficiency of Large-Scale Distributed Systems, ACM Computing Surveys (CSUR), vol.46, pp.89-91, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00767582
Dynamic Cloud Provisioning for Scientific Grid Workflows, Proc. of the 11th ACM/IEEE Intl. Conf. on Grid Computing (Grid), p.33, 2010. ,
Saving energy by exploiting residual imbalances on iterative applications, 2014 21st International Conference on High Performance Computing (HiPC), 2014. ,
Parallel Simulation of Peer-to-Peer Systems, CCGrid 2012 -The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, p.139, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00602216
, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, vol.72, p.71, 2016.
The Mont-Blanc Prototype: An Alternative Approach for HPC Systems, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, p.25, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01354939
Energy-aware scheduling : complexity and algorithms". Theses. Ecole normale supérieure de lyon -ENS LYON, p.89, 2012. ,
URL : https://hal.archives-ouvertes.fr/tel-00744247
Adagio, Proceedings of the 23rd international conference on Conference on Supercomputing -ICS '09, vol.3, p.129, 2009. ,
A MultiLanguage Computing Environment for Literate Programming and Reproducible Research, Journal of Statistical Software, vol.46, issue.3, p.71, 2012. ,
Using Machine Learning for Data Center Cooling Infrastructure Efficiency Prediction, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017. ,
A Framework for Performance Modeling and Prediction, Proc. of the ACM/IEEE Conference on Supercomputing, p.34, 2002. ,
A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications, p.58, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01248109
Collecting Performance Data with PAPI-C, Tools for High Performance Computing, p.59, 2009. ,
Performance modeling of a geophysics application to accelerate over-decomposition parameter tuning through simulation, Concurrency and Computation: Practice and Experience, p.5012, 2018. ,
DCSim: a data centre simulation tool for evaluating dynamic virtualized resource management, Int. Conf. on Network and Service Management, p.33, 2012. ,
Technologies and Future Prospects of Green Supercomputer ZettaScaler, C 100, p.27, 2017. ,
A bridging model for parallel computation, Communications of the ACM 33, vol.8, p.107, 1990. ,
On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations, In: ACM Trans. Model. Comput. Simul, vol.23, p.33, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00872476
On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations, ACM Transactions on Modeling and Computer Simulation, vol.23, p.43, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00872476
Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband, p.52, 2010. ,
Assessing the Energy Efficiency of High Performance Computing (HPC) Data Centers, p.33, 2018. ,
Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, p.31, 2001. ,
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th IPDPS, p.34, 2004. ,
,
OpenManage Deployment Toolkit Version 4.4 Command Line Interface Reference Guide, p.66, 2019. ,
Network Co-design as a Gateway to Exascale, p.23, 2016. ,
, , p.15, 2019.
, Intel Processors and FPGAs -Better Together, p.28, 2018.
Kalray announces the release of its third-generation MPPA® processor "Coolidge, p.27, 2017. ,
, , p.91, 2008.
, , p.11, 2019.
Comparison of calibrated (blue) and uncalibrated (green) runs of LU with real experiments (red) and ideal scaling (grey), p.79 ,
, As can be seen, there is signifcantly less jitter for local communications than for inter-node communications. We furthermore found that small messages were sent faster over network links than shared memory since the send operation was executed asynchronously for remote destinations. For large messages, the loopback was almost an order of magnitude faster which is expected due to the much faster bandwidth
These measurements were repeated three times and no-earlier than 3 weeks after the previous measurement. Note that the y-axis begins at 85 W. The variation per core-count is therefore minimal, p.92 ,
, Changing the frequency while keeping the load constant causes the consumed energy to grow quadratically. The energy that is required to keep an idling node on (despite all energy-efficiency measures, such as C-states), is additionally shown here as P idle
, Note that not all frequencies are shown in this figure to reduce overplotting. The energy that is required to keep an idling node on (despite all energy-efficiency measures, such as C-states), is shown here as P idle, Varying the number of cores while keeping the workload constant (NAS-EP, class C) reveals a linear connection between load and power consumption
, Power consumption over time when running NAS-EP, NAS-LU, HPL or idling (with 12 active cores and the frequency set to 2300 MHz), p.95
, Alternating between the highest frequency and the idle state (represented by frequency "0") consumes more power than running at a reduced frequency (with the same finishing time). The power data was obtained by actual measurements of NAS-LU
, Using Adagio with Ondes3D works for about 15 iterations, after which even the most loaded host taurus-16 starts to slow down, p.118
, 119 10.1 On the Stampede supercomputer, one "slow" and one "fast" mode seem to exist for MPI_Recv and MPI_Send (depicted) calls, making faithful simulation very difficult