Profiling and debugging by efficient tracing of hybrid multi-threaded HPC applications.

Abstract : Supercomputers’ evolution is at the source of both hardware and software challenges. In the quest for the highest computing power, the interdependence in-between simulation components is becoming more and more impacting, requiring new approaches. This thesis is focused on the software development aspect and particularly on the observation of parallel software when being run on several thousand cores. This observation aims at providing developers with the necessary feedback when running a program on an execution substrate which has not been modeled yet because of its complexity. In this purpose, we firstly introduce the development process from a global point of view, before describing developer tools and related work. In a second time, we present our contribution which consists in a trace based profiling and debugging tool and its evolution towards an on-line coupling method which as we will show is more scalable as it overcomes IOs limitations. Our contribution also covers our time-stamp synchronisation algorithm for tracing purposes which relies on a probabilistic approach with quantified error. We also present a tool allowing machine characterisation from the MPI aspect and demonstrate the presence of machine noise for both point to point and collectives, justifying the use of an empirical approach. In summary, this work proposes and motivates an alternative approach to trace based event collection while preserving event granularity and a reduced overhead.
Complete list of metadatas

Cited literature [133 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01102639
Contributor : Jean-Baptiste Besnard <>
Submitted on : Tuesday, January 13, 2015 - 11:50:15 AM
Last modification on : Monday, October 15, 2018 - 4:20:02 PM
Long-term archiving on : Tuesday, April 14, 2015 - 10:41:46 AM

Identifiers

  • HAL Id : tel-01102639, version 1

Collections

CEA | PRISM | UVSQ | DAM

Citation

Jean-Baptiste Besnard. Profiling and debugging by efficient tracing of hybrid multi-threaded HPC applications.. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Versailles Saint Quentin en Yvelines, 2014. English. ⟨tel-01102639⟩

Share

Metrics

Record views

410

Files downloads

659