A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic Task-based Scientific Applications

Luka Stanisic 1, 2, 3
1 CORSE - Compiler Optimization and Run-time Systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
3 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : The evolution of High-Performance Computing systems has taken a sharp turn in the last decade. Due to the enormous energy consumption of modern platforms, miniaturization and frequency scaling of processors have reached a limit. The energy constraints has forced hardware manufacturers to develop alternative computer architecture solutions in order to manage answering the ever-growing need of performance imposed by the scientists and the society. However, efficiently programming such diversity of platforms and fully exploiting the potentials of the numerous different resources they offer is extremely challenging. The previously dominant trend for designing high performance applications, which was based on large monolithic codes offering many optimization opportunities, has thus become more and more difficult to apply since implementing and maintaining such complex codes is very difficult. Therefore, application developers increasingly consider modular approaches and dynamic application executions. A popular approach is to implement the application at a high level independently of the hardware architecture as Directed Acyclic Graphs of tasks, each task corresponding to carefully optimized computation kernels for each architecture. A runtime system can then be used to dynamically schedule those tasks on the different computing resources. Developing such solutions and ensuring their good performance on a wide range of setups is however very challenging. Due to the high complexity of the hardware, to the duration variability of the operations performed on a machine and to the dynamic scheduling of the tasks, the application executions are non-deterministic and the performance evaluation of such systems is extremely difficult. Therefore, there is a definite need for systematic and reproducible methods for conducting such research as well as reliable performance evaluation techniques for studying these complex systems. In this thesis, we show that it is possible to perform a clean, coherent, reproducible study, using simulation, of dynamic HPC applications. We propose a unique workflow based on two well-known and widely-used tools, Git and Org-mode, for conducting a reproducible experimental research. This simple workflow allows for pragmatically addressing issues such as provenance tracking and data analysis replication. Our contribution to the performance evaluation of dynamic HPC applications consists in the design and validation of a coarse-grain hybrid simulation/emulation of StarPU, a dynamic task-based runtime for hybrid architectures, over SimGrid, a versatile simulator for distributed systems. We present how this tool can achieve faithful performance predictions of native executions on a wide range of heterogeneous machines and for two different classes of programs, dense and sparse linear algebra applications, that are a good representative of the real scientific applications.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01248109
Contributor : Arnaud Legrand <>
Submitted on : Wednesday, December 23, 2015 - 5:55:39 PM
Last modification on : Thursday, October 24, 2019 - 10:35:57 AM
Long-term archiving on: Thursday, March 24, 2016 - 1:01:51 PM

Identifiers

  • HAL Id : tel-01248109, version 1

Collections

CNRS | LIG | UGA

Citation

Luka Stanisic. A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic Task-based Scientific Applications. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2015. English. ⟨tel-01248109v1⟩

Share

Metrics

Record views

249

Files downloads

146