A decoupled approach to high-level loop optimization : tile shapes, polyhedral building blocks and low-level compilers

Tobias Grosser 1
1 Parkas - Parallélisme de Kahn Synchrone
CNRS - Centre National de la Recherche Scientifique : UMR 8548, Inria Paris-Rocquencourt, DI-ENS - Département d'informatique de l'École normale supérieure
Abstract : Despite decades of research on high-level loop optimizations and theirsuccessful integration in production C/C++/FORTRAN com- pilers, most compilerinternal loop transformation systems only partially address the challengesposed by the increased complexity and diversity of today’s hardware. Especiallywhen exploiting domain specific knowledge to obtain optimal code for complextargets such as accelerators or many-cores processors, many existing loopoptimization frameworks have difficulties exploiting this hardware. As aresult, new domain specific optimization schemes are developed independentlywithout taking advantage of existing loop optimization technology. This resultsboth in missed optimization opportunities as well as low portability of theseoptimization schemes to different compilers. One area where we see the need forbetter optimizations are iterative stencil computations, an importantcomputational problem that is regularly optimized by specialized, domainspecific compilers, but where generating efficient code is difficult.In this work we present new domain specific optimization strategies that enablethe generation of high-performance GPU code for stencil computations. Differentto how most existing domain specific compilers are implemented, we decouple thehigh-level optimization strategy from the low-level optimization andspecialization necessary to yield optimal performance. As high-leveloptimization scheme we present a new formulation of split tiling, a tilingtechnique that ensures reuse along the time dimension as well as balancedcoarse grained parallelism without the need for redundant computations. Usingsplit tiling we show how to integrate a domain specific optimization into ageneral purpose C-to-CUDA translator, an approach that allows us to reuseexisting non-domain specific optimizations. We then evolve split tiling into ahybrid hexagonal/parallelogram tiling scheme that allows us to generate codethat even better addresses GPU specific concerns. To conclude our work ontiling schemes we investigate the relation between diamond and hexagonaltiling. Starting with a detailed analysis of diamond tiling including therequirements it poses on tile sizes and wavefront coefficients, we provide aunified formulation of hexagonal and diamond tiling which enables us to performhexagonal tiling for two dimensional problems (one time, one space) in thecontext of a general purpose optimizer such as Pluto. Finally, we use thisformulation to evaluate hexagonal and diamond tiling in terms ofcompute-to-communication and compute-to-synchronization ratios.In the second part of this work, we discuss our contributions to importantinfrastructure components, our building blocks, that enviable us to decoupleour high-level optimizations from both the necessary code generationoptimizations as well as the compiler infrastructure we apply the optimizationto. We start with presenting a new polyhedral extractor that obtains apolyhedral representation from a piece of C code, widening the supported C codeto exploit the full generality of Presburger arithmetic and taking special careof modeling language semantics even in the presence of defined integerwrapping. As a next step, we present a new polyhedral AST generation approach,which extends AST generation beyond classical control flow generation byallowing the generation of user provided mappings. Providing a fine-grainedoption mechanism, we give the user fine grained control about AST generatordecisions and add extensive support for specialization e.g., with a newgeneralized form of polyhedral unrolling. To facilitate the implementation ofpolyhedral transformations, we present a new schedule representation, scheduletrees, which proposes to make the inherent tree structure of schedules explicitto simplify the work with complex polyhedral schedules.The last part of this work takes a look at our contributions to low-levelcompilers.
Document type :
Complete list of metadatas

Cited literature [121 references]  Display  Hide  Download

Contributor : Abes Star <>
Submitted on : Wednesday, April 22, 2015 - 1:00:09 AM
Last modification on : Thursday, February 7, 2019 - 1:33:23 AM
Long-term archiving on : Monday, September 14, 2015 - 12:15:33 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01144563, version 1


Tobias Grosser. A decoupled approach to high-level loop optimization : tile shapes, polyhedral building blocks and low-level compilers. Programming Languages [cs.PL]. Université Pierre et Marie Curie - Paris VI, 2014. English. ⟨NNT : 2014PA066270⟩. ⟨tel-01144563⟩



Record views


Files downloads