Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations

Matthieu Dorier 1, 2
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Million-core supercomputers have become a reality in 2012 with LLNL's Sequoia supercomputer. Following Moore's law, Exascale machines (capable of 10E18 floating point operations per second) are expected by 2018. Such an immense computational power is used in many research areas, including earth sciences, biology, climate, or cosmology, where large-scale simulations are conducted to understand physical phenomena better. These simulations aim to replace real experiments that are either too expensive, irreproducible or simply unfeasible. But larger simulations on larger machines lead to the production of larger amounts of data. These data need to be efficiently stored and processed in order to retrieve scientific insights. The traditional approach to data management consists of storing the output of the simulation during its run, move it and analyze it later offline. With an increasing gap between the performance of storage systems and the computation capabilities of recent post-Petascale supercomputers, this approach becomes unsustainable. This Ph.D. thesis explores new approaches to data management for post-Petascale supercomputers. We first introduce the Damaris approach, which leverages the multicore nature of recent machines to offload data-management tasks into dedicated cores. We study in particular how Damaris can be used to hide the variability in I/O (Input/Output) performance, and to provide in situ visualization capabilities to simulations in a way that does not impact their performance. We then use Damaris to evaluate the energy consumption of various data management approaches, including the use of dedicated I/O nodes. We then study the effect of multi-application I/O contention on the performance of the storage system. We propose the CALCioM approach, which provides a coordination layer between distinct applications to mitigate I/O interference. In regard to access patterns, it has been observed that most applications have a repetitive behavior with respect to I/O, and that a model of this behavior can be useful to many systems (including CALCioM, but also any scheduler, caching or prefetching system). Based on this, we propose Omnisc'IO, an approach that leverages grammars to predict the spatial and temporal access patterns of HPC simulations at run time. This thesis includes results of experiments conducted with real scientific simulations, including CM1, GTC, LAMMPS and Nek5000, on real petascale and post-petascale platforms, including NCSA's Blue Waters, ORNL's Titan, NICS's Kraken and ANL's Intrepid.
Complete list of metadatas

Cited literature [130 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01099105
Contributor : Matthieu Dorier <>
Submitted on : Wednesday, December 31, 2014 - 1:05:13 PM
Last modification on : Friday, November 16, 2018 - 1:40:48 AM
Long-term archiving on : Saturday, April 15, 2017 - 12:15:31 PM

Identifiers

  • HAL Id : tel-01099105, version 1

Relations

Citation

Matthieu Dorier. Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations. Distributed, Parallel, and Cluster Computing [cs.DC]. Ecole Normale Supérieure de Rennes, 2014. English. ⟨tel-01099105⟩

Share

Metrics

Record views

1167

Files downloads

1240