Skip to Main content Skip to Navigation

Contributions to Software Runtime for Clustered Manycores Applied to Embedded and High-Performance Applications

Abstract : The growing need for computing is more and more challenging, especially in the embedded system world with autonomous cars, drones, and smartphones. New highly parallel and heterogeneous processors emerge to answer this challenge. They operate in constrained environments with real-time requirements, reduced power consumption, and safety. Programming these new chips is a time-consuming and challenging task leading to huge software development costs. The Kalray MPPA® processor is a competitive example for low-power super-computing on a single chip. It integrates up to 288 VLIW cores grouped in 18 clusters, each fitted with shared local memory. These clusters are interconnected with a high-bandwidth network-on-chip, and DMA engines are used to communicate. This processor is used in this thesis for experimental results. We propose the AOS library enabling highperformance communications and synchronizations of distributed local memories on clustered manycores. AOS provides 70% of the peak hardware throughput for transfers larger than 8 KB. We propose tools for the implementation of static and dynamic dataflow programs based on AOS to accelerate the parallel application developments onto clustered manycores. We propose an implementation of OpenVX for clustered manycores on top of AOS. OpenVX is a standard based on dataflow for the development of computer vision and neural network computing. The proposed OpenVX implementation includes automatic optimizations like data prefetch to overlap communications and computations, or kernel fusion to avoid the main memory bandwidth bottleneck. Results show super-linear speedups.
Document type :
Complete list of metadata

Cited literature [351 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Friday, May 17, 2019 - 12:19:07 PM
Last modification on : Tuesday, August 24, 2021 - 3:09:37 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02132613, version 1


Julien Hascoët. Contributions to Software Runtime for Clustered Manycores Applied to Embedded and High-Performance Applications. Embedded Systems. INSA de Rennes, 2018. English. ⟨NNT : 2018ISAR0029⟩. ⟨tel-02132613⟩



Record views


Files downloads