Program Transformations and Memory Architecture Optimizations for High-Level Synthesis of Hardware Accelerators

Alexandru Plesco 1
1 COMPSYS - Compilation and embedded computing systems
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : A wide category of sold products including telecommunication and multimedia propose more and more advanced features and functionalities. These functionalities come at a cost of increased design complexity. For performance and power budget issues, these features can be accelerated using dedicated hardware accelerators. To meet the required time-to-market and development price, traditional hardware design methodologies are not sufficient and the use of high-level synthesis (HLS) tools is an appealing alternative. These tools are now getting more mature for generating hardware accelerators with an optimized internal structure, thanks to efficient scheduling techniques, resource sharing, and finite-state machines generation. However, interfacing them with the outside world, i.e., integrating the automatically-generated hardware accelerators within the complete design, with optimized communications, so that they achieve the best throughput, remains a very hard task, reserved to expert designers. The leitmotiv of this thesis was to study and to develop source-to-source strategies to improve the design of these interfaces, trying to consider the HLS tool as a back-end for more advanced front-end transformations. In the first part of the thesis, as a case study, we designed by hand, in VHDL, an intelligent glue logic to interface an accelerator, for matrix-matrix multiplication, generated by the MMAlpha HLS tool. Using data dependence information, we implemented double-buffering and blocking techniques on a scratchpad-like local SRAM memory to exploit data reuse. This increased significantly the performance of the system but required also a significant engineering effort. We then showed, on several multi-media applications and with another HLS tool, Spark, that the same benefit could be obtained with a preliminary semi-automatic source-to-source (here C-to-C) transformations step. For that, we used an advanced state-of-the-art compiler front-end, based on the Open64 compiler and the WRaP-IT framework for polyhedral transformations. Significant improvements were shown in particular on the synthesis of part of the video color space conversion from MediaBench~II benchmarks, for which data was fed through a processor cache memory. This study demonstrated the importance of loop transformations as a pre-processing step to HLS tools, but also the difficulty to use them depending on the HLS tool features to express external communications. In the second part of the thesis, using the C2H HLS tool from Altera, which can synthesize hardware accelerators communicating to an external DDR-SDRAM memory, we showed that it is possible to automatically restructure the application code, to generate adequate communication processes in C, and to compile them all with C2H, so that the resulting application is highly-optimized, with full usage of the memory bandwidth. These transformations and optimizations, which combine techniques such as double buffering, array contraction, loop tiling, software pipelining, among others, were incorporated in an automatic source-to-source transformation tool, called Chuba, based on the polyhedral model representation. Our study shows that HLS tools can indeed be used as back-end optimizers for front-end optimizations, as it is the case for standard compilation with high-level transformations developed on top of assembly-code optimizers. We believe this is the way to go for making HLS tools viable.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00544349
Contributor : Alexandru Plesco <>
Submitted on : Tuesday, December 7, 2010 - 4:53:44 PM
Last modification on : Thursday, November 21, 2019 - 2:35:08 AM
Long-term archiving on : Monday, November 5, 2012 - 12:35:39 PM

Identifiers

  • HAL Id : tel-00544349, version 1

Collections

Citation

Alexandru Plesco. Program Transformations and Memory Architecture Optimizations for High-Level Synthesis of Hardware Accelerators. Other [cs.OH]. Ecole normale supérieure de lyon - ENS LYON, 2010. English. ⟨tel-00544349⟩

Share

Metrics

Record views

1031

Files downloads

1217