Skip to Main content Skip to Navigation

Performance Analysis Strategies for Task-based Applications on Hybrid Platforms

Abstract : Programming paradigms in High-Performance Computing have been shifting toward task-based models that are capable of adapting readily to heterogeneous and scalable supercomputers. The performance of task-based applications heavily depends on the runtime scheduling heuristics and on its ability to exploit computing and communication resources. Unfortunately, the traditional performance analysis strategies are unfit to fully understand task-based runtime systems and applications: they expect a regular behavior with communication and computation phases, while task-based applications demonstrate no clear phases. Moreover, the finer granularity of task-based applications typically induces a stochastic behavior that leads to irregular structures that are difficult to analyze. In this thesis, we propose performance analysis strategies that exploit the combination of application structure, scheduler, and hardware information. We show how our strategies can help to understand performance issues of task-based applications running on hybrid platforms. Our performance analysis strategies are built on top of modern data analysis tools, enabling the creation of custom visualization panels that allow understanding and pinpointing performance problems incurred by bad scheduling decisions and incorrect runtime system and platform configuration. By combining simulation and debugging we are also able to build a visual representation of the internal state and the estimations computed by the scheduler when scheduling a new task. We validate our proposal by analyzing traces from a Cholesky decomposition implemented with the StarPU task-based runtime system and running on hybrid (CPU/GPU) platforms. Our case studies show how to enhance the task partitioning among the multi-(GPU, core) to get closer to theoretical lower bounds, how to improve MPI pipelining in multi-(node, core, GPU) to reduce the slow start in distributed nodes and how to upgrade the runtime system to increase MPI bandwidth. By employing simulation and debugging strategies, we also provide a workflow to investigate, in depth, assumptions concerning the scheduler decisions. This allows us to suggest changes to improve the runtime system scheduling and prefetch mechanisms.
Complete list of metadatas
Contributor : Arnaud Legrand <>
Submitted on : Monday, January 14, 2019 - 1:57:44 PM
Last modification on : Thursday, November 19, 2020 - 2:30:04 PM
Long-term archiving on: : Monday, April 15, 2019 - 12:21:34 PM


Files produced by the author(s)


  • HAL Id : tel-01962333, version 1


Vinícius Garcia Pinto. Performance Analysis Strategies for Task-based Applications on Hybrid Platforms. Distributed, Parallel, and Cluster Computing [cs.DC]. Universidade Federal do Rio Grande do Sul - UFRGS; UGA - Université Grenoble Alpes, 2018. English. ⟨tel-01962333⟩



Record views


Files downloads