Skip to Main content Skip to Navigation

Performance Analysis Strategies for Task-based Applications on Hybrid Platforms

Abstract : Programming paradigms in High-Performance Computing have been shiftingtoward task-based models that are capable of adapting readily toheterogeneous and scalable supercomputers. The performance oftask-based applications heavily depends on the runtime schedulingheuristics and on its ability to exploit computing and communicationresources.Unfortunately, the traditional performance analysis strategies areunfit to fully understand task-based runtime systems and applications:they expect a regular behavior with communication and computationphases, while task-based applications demonstrate no clearphases. Moreover, the finer granularity of task-based applicationstypically induces a stochastic behavior that leads to irregularstructures that are difficult to analyze.In this thesis, we propose performance analysis strategies thatexploit the combination of application structure, scheduler, andhardware information. We show how our strategies can help tounderstand performance issues of task-based applications running onhybrid platforms. Our performance analysis strategies are built on topof modern data analysis tools, enabling the creation of customvisualization panels that allow understanding and pinpointingperformance problems incurred by bad scheduling decisions andincorrect runtime system and platform configuration.By combining simulation and debugging we are also able to build a visualrepresentation of the internal state and the estimations computed bythe scheduler when scheduling a new task.We validate our proposal by analyzing traces from a Choleskydecomposition implemented with the StarPU task-based runtime systemand running on hybrid (CPU/GPU) platforms. Our case studies show howto enhance the task partitioning among the multi-(GPU, core) to getcloser to theoretical lower bounds, how to improve MPI pipelining inmulti-(node, core, GPU) to reduce the slow start in distributed nodesand how to upgrade the runtime system to increase MPI bandwidth. Byemploying simulation and debugging strategies, we also provide aworkflow to investigate, in depth, assumptions concerning the schedulerdecisions. This allows us to suggest changes to improve the runtimesystem scheduling and prefetch mechanisms.
Document type :
Complete list of metadatas
Contributor : Abes Star :  Contact
Submitted on : Monday, March 11, 2019 - 2:53:10 PM
Last modification on : Wednesday, October 7, 2020 - 3:02:57 AM
Long-term archiving on: : Wednesday, June 12, 2019 - 3:07:19 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02063804, version 1



Vinicius Garcia Pinto. Performance Analysis Strategies for Task-based Applications on Hybrid Platforms. Performance [cs.PF]. Université Grenoble Alpes; Universidade Federal do Rio Grande do Sul (Brésil), 2018. English. ⟨NNT : 2018GREAM058⟩. ⟨tel-02063804⟩



Record views


Files downloads