Skip to Main content Skip to Navigation

Automatic Decomposition of Parallel Programs for Optimization and Performance Prediction

Abstract : In high performance computing, benchmarks evaluate architectures, compilers and optimizations. Standard benchmarks are mostly issued from the industrial world and may have a very long execution time. So, evaluating a new architecture or an optimization is costly. Most of the benchmarks are composed of independent kernels. Usually, users are only interested by a small subset of these kernels. To get faster and easier local optimizations, we should find ways to extract kernels as standalone executables. Also, benchmarks have redundant computational kernels. Some calculations do not bring new informations about the system that we want to study, despite that we measure them many times. By detecting similar operations and removing redundant kernels, we can reduce the benchmarking cost without loosing information about the system. This thesis proposes a method to automatically decompose applications into small kernels called codelets. Each codelet is a standalone executable that can be replayed in different execution contexts to evaluate them. This thesis quantifies how much the decomposition method accelerates optimization and benchmarking processes. It also quantify the benefits of local optimizations over global optimizations. There are many related works which aim to enhance the benchmarking process. In particular, we note machine learning approaches and sampling techniques. Decomposing applications into independent pieces is not a new idea. It has been successfully applied on sequential codes. In this thesis we extend it to parallel programs. Evaluating scalability or new micro-architectures is 25× faster with codelets than with full application executions. Codelets predict the execution time with an accuracy of 94% and find local optimizations that outperform the best global optimization up to 1.06×.
Complete list of metadata
Contributor : Mihail Popov <>
Submitted on : Thursday, December 8, 2016 - 3:42:00 PM
Last modification on : Friday, January 10, 2020 - 3:42:22 PM
Long-term archiving on: : Thursday, March 23, 2017 - 7:54:20 AM


  • HAL Id : tel-01412638, version 1


Mihail Popov. Automatic Decomposition of Parallel Programs for Optimization and Performance Prediction . Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Versailles Saint Quentin en Yvelines (UVSQ), France, 2016. English. ⟨NNT : 2016SACLV087⟩. ⟨tel-01412638v1⟩



Record views


Files downloads