Skip to Main content Skip to Navigation

Combining static and dynamic approaches to model loop performance in HPC

Abstract : The complexity of CPUs has increased considerably since their beginnings, introducing mechanisms such as register renaming, out-of-order execution, vectorization,prefetchers and multi-core environments to keep performance rising with each product generation. However, so has the difficulty in making proper use of all these mechanisms, or even evaluating whether one’s program makes good use of a machine,whether users’ needs match a CPU’s design, or, for CPU architects, knowing how each feature really affects customers.This thesis focuses on increasing the observability of potential bottlenecks inHPC computational loops and how they relate to each other in modern microarchitectures.We will first introduce a framework combining CQA and DECAN (respectively static and dynamic analysis tools) to get detailed performance metrics on smallcodelets in various execution scenarios.We will then present PAMDA, a performance analysis methodology leveraging elements obtained from codelet analysis to detect potential performance problems in HPC applications and help resolve them. A work extending the Cape linear model to better cover Sandy Bridge and give it more flexibility for HW/SW codesign purposes will also be described. It will bedirectly used in VP3, a tool evaluating the performance gains vectorizing loops could provide.Finally, we will describe UFS, an approach combining static analysis and cycle accurate simulation to very quickly estimate a loop’s execution time while accounting for out-of-order limitations in modern CPUs
Complete list of metadatas

Cited literature [118 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, March 24, 2016 - 11:02:35 AM
Last modification on : Friday, January 10, 2020 - 3:42:22 PM
Document(s) archivé(s) le : Saturday, June 25, 2016 - 2:12:40 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01293040, version 1



Vincent Palomares. Combining static and dynamic approaches to model loop performance in HPC. Hardware Architecture [cs.AR]. Université de Versailles-Saint Quentin en Yvelines, 2015. English. ⟨NNT : 2015VERS040V⟩. ⟨tel-01293040⟩



Record views


Files downloads