Skip to Main content Skip to Navigation

A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications

Abstract : The evolution of High-Performance Computing systems has taken asharp turn in the last decade. Due to the enormous energyconsumption of modern platforms, miniaturization and frequencyscaling of processors have reached a limit. The energy constraintshas forced hardware manufacturers to develop alternative computerarchitecture solutions in order to manage answering the ever-growingneed of performance imposed by the scientists and thesociety. However, efficiently programming such diversity ofplatforms and fully exploiting the potentials of the numerousdifferent resources they offer is extremely challenging. Thepreviously dominant trend for designing high performanceapplications, which was based on large monolithic codes offeringmany optimization opportunities, has thus become more and moredifficult to apply since implementing and maintaining such complexcodes is very difficult. Therefore, application developersincreasingly consider modular approaches and dynamic applicationexecutions. A popular approach is to implement the application at ahigh level independently of the hardware architecture as DirectedAcyclic Graphs of tasks, each task corresponding to carefullyoptimized computation kernels for each architecture. A runtimesystem can then be used to dynamically schedule those tasks on thedifferent computing resources.Developing such solutions and ensuring their good performance on awide range of setups is however very challenging. Due to the highcomplexity of the hardware, to the duration variability of theoperations performed on a machine and to the dynamic scheduling ofthe tasks, the application executions are non-deterministic and theperformance evaluation of such systems is extremelydifficult. Therefore, there is a definite need for systematic andreproducible methods for conducting such research as well asreliable performance evaluation techniques for studying thesecomplex systems.In this thesis, we show that it is possible to perform a clean,coherent, reproducible study, using simulation, of dynamic HPCapplications. We propose a unique workflow based on two well-knownand widely-used tools, Git and Org-mode, for conducting areproducible experimental research. This simple workflow allows forpragmatically addressing issues such as provenance tracking and dataanalysis replication. Our contribution to the performance evaluationof dynamic HPC applications consists in the design and validation ofa coarse-grain hybrid simulation/emulation of StarPU, a dynamictask-based runtime for hybrid architectures, over SimGrid, aversatile simulator for distributed systems. We present how thistool can achieve faithful performance predictions of nativeexecutions on a wide range of heterogeneous machines and for twodifferent classes of programs, dense and sparse linear algebraapplications, that are a good representative of the real scientificapplications.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Thursday, January 21, 2016 - 9:01:07 PM
Last modification on : Thursday, November 19, 2020 - 12:59:58 PM
Long-term archiving on: : Friday, November 11, 2016 - 3:02:13 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01248109, version 2



Luka Stanisic. A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2015. English. ⟨NNT : 2015GREAM035⟩. ⟨tel-01248109v2⟩



Record views


Files downloads