Skip to Main content Skip to Navigation
New interface

Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators

Junjie Lai 1 
1 ALF - Amdahl's Law is Forever
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
Abstract : This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU performance analysis. We have designed an approach which is simple enough for developers to use and can provide more insight into the performance results. And we have designed an approach to estimate the performance upper bound of an application on GPUs and guide the performance optimization. First part of the thesis work was presented at Rapido '12 workshop. We have de- veloped an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPU. TEG passes GPU kernels' as- sembly code and collects information including instruction type, operands, etc. Then TEG can predict GPU applications' performance in cycle-approximate level with the instruction trace and other information collected from Barra simulator. TEG also allows to quantify some performance bottlenecks' penalties. The second main part of this thesis is going to be presented at CGO '13 confer- ence. We developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upperbound of an application, we know how much optimization space is left and can decide the optimization e ort. Also with the analysis we can understand which parameters are critical to the performance. As an example, we analyzed the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. Guided by this analysis and using the native assembly language, on average, our SGEMM implementations achieve about 5% better performance than CUBLAS in CUDA 4.1 SDK for large matrices on GTX580. The achieved performance is around 90% of the estimated upper bound performance of SGEMM on GTX580.
Document type :
Complete list of metadata

Cited literature [86 references]  Display  Hide  Download
Contributor : André Seznec Connect in order to contact the contributor
Submitted on : Monday, November 25, 2013 - 9:13:12 AM
Last modification on : Monday, June 27, 2022 - 3:02:08 AM
Long-term archiving on: : Wednesday, February 26, 2014 - 4:24:20 AM


  • HAL Id : tel-00908579, version 1


Junjie Lai. Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators. Hardware Architecture [cs.AR]. Université de Rennes I, 2013. English. ⟨NNT : ⟩. ⟨tel-00908579⟩



Record views


Files downloads