Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators

Junjie Lai 1
1 ALF - Amdahl's Law is Forever
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
Abstract : This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU performance analysis. We have designed an approach which is simple enough for developers to use and can provide more insight into the performance results. And we have designed an approach to estimate the performance upper bound of an application on GPUs and guide the performance optimization. First part of the thesis work was presented at Rapido '12 workshop. We have de- veloped an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPU. TEG passes GPU kernels' as- sembly code and collects information including instruction type, operands, etc. Then TEG can predict GPU applications' performance in cycle-approximate level with the instruction trace and other information collected from Barra simulator. TEG also allows to quantify some performance bottlenecks' penalties. The second main part of this thesis is going to be presented at CGO '13 confer- ence. We developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upperbound of an application, we know how much optimization space is left and can decide the optimization e ort. Also with the analysis we can understand which parameters are critical to the performance. As an example, we analyzed the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. Guided by this analysis and using the native assembly language, on average, our SGEMM implementations achieve about 5% better performance than CUBLAS in CUDA 4.1 SDK for large matrices on GTX580. The achieved performance is around 90% of the estimated upper bound performance of SGEMM on GTX580.
Document type :
Theses
Liste complète des métadonnées

Cited literature [86 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00908579
Contributor : André Seznec <>
Submitted on : Monday, November 25, 2013 - 9:13:12 AM
Last modification on : Friday, November 16, 2018 - 1:39:34 AM
Document(s) archivé(s) le : Wednesday, February 26, 2014 - 4:24:20 AM

Identifiers

  • HAL Id : tel-00908579, version 1

Citation

Junjie Lai. Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators. Hardware Architecture [cs.AR]. Université de Rennes I, 2013. English. ⟨tel-00908579⟩

Share

Metrics

Record views

604

Files downloads

367