Skip to Main content Skip to Navigation

Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators

Junjie Lai 1
1 ALF - Amdahl's Law is Forever
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
Abstract : This thesis work is funded by the ANR PetaQCD project. We have mainly worked on two topics of GPU performance analysis. We have designed an approach which is simple enough for developers to use and can provide more insight into the performance results. And we have designed an approach to estimate the performance upper bound of an application on GPUs and guide the performance optimization. First part of the thesis work was presented at Rapido '12 workshop. We have de- veloped an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPU. TEG passes GPU kernels' as- sembly code and collects information including instruction type, operands, etc. Then TEG can predict GPU applications' performance in cycle-approximate level with the instruction trace and other information collected from Barra simulator. TEG also allows to quantify some performance bottlenecks' penalties. The second main part of this thesis is going to be presented at CGO '13 confer- ence. We developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upperbound of an application, we know how much optimization space is left and can decide the optimization e ort. Also with the analysis we can understand which parameters are critical to the performance. As an example, we analyzed the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. Guided by this analysis and using the native assembly language, on average, our SGEMM implementations achieve about 5% better performance than CUBLAS in CUDA 4.1 SDK for large matrices on GTX580. The achieved performance is around 90% of the estimated upper bound performance of SGEMM on GTX580.
Document type :
Complete list of metadatas

Cited literature [86 references]  Display  Hide  Download
Contributor : André Seznec <>
Submitted on : Monday, November 25, 2013 - 9:13:12 AM
Last modification on : Thursday, January 7, 2021 - 4:25:26 PM
Long-term archiving on: : Wednesday, February 26, 2014 - 4:24:20 AM


  • HAL Id : tel-00908579, version 1


Junjie Lai. Throughput Oriented Analytical Models for Performance Estimation on Programmable Accelerators. Hardware Architecture [cs.AR]. Université de Rennes I, 2013. English. ⟨tel-00908579⟩



Record views


Files downloads