Programmation des architectures hétérogènes à l'aide de tâches divisibles ou modulables

Abstract : Hybrid computing platforms equipped with accelerators are now commonplace in high performance computing platforms. Due to this evolution, researchers concentrated their efforts on conceiving tools aiming to ease the programmation of applications able to use all computing units of such machines. The StarPU runtime system developed in the STORM team at INRIA Bordeaux was conceived to be a target for parallel language compilers and specialized libraries (linear algebra, Fourier transforms,...). To provide the portability of codes and performances to applications, StarPU schedules dynamic task graphs efficiently on all heterogeneous computing units of the machine. One of the most difficult aspects when expressing an application into a graph of task is to choose the granularity of the tasks, which typically goes hand in hand with the size of blocs used to partition the problem's data. Small granularity do not allow to efficiently use accelerators such as GPUs which require a small amount of task with massive inner data-parallelism in order to obtain peak performance. Inversely, processors typically exhibit optimal performances with a big amount of tasks possessing smaller granularities. The choice of the task granularity not only depends on the type of computing units on which it will be executed, but in addition it will influence the quantity of parallelism available in the system: too many small tasks may flood the runtime system by introducing overhead, whereas too many small tasks may create a parallelism deficiency. Currently, most approaches rely on finding a compromise granularity of tasks which does not make optimal use of both CPU and accelerator resources. The objective of this thesis is to solve this granularity problem by aggregating resources in order to view them not as many small resources but fewer larger ones collaborating to the execution of the same task. One theoretical machine and scheduling model allowing to represent this process exists since several decades: the parallel tasks. The main contributions of this thesis are to make practical use of this model by implementing a parallel task mechanism inside StarPU and to implement and study parallel task schedulers of the literature. The validation of the model is made by improving the programmation and optimizing the execution of numerical applications on top of modern computing machines.
Document type :
Theses
Complete list of metadatas

Cited literature [52 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01816341
Contributor : Abes Star <>
Submitted on : Friday, June 15, 2018 - 10:32:23 AM
Last modification on : Friday, November 23, 2018 - 3:08:24 AM
Long-term archiving on : Monday, September 17, 2018 - 11:19:44 AM

File

COJEAN_TERRY_2018.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01816341, version 1

Collections

Citation

Terry Cojean. Programmation des architectures hétérogènes à l'aide de tâches divisibles ou modulables. Autre [cs.OH]. Université de Bordeaux, 2018. Français. ⟨NNT : 2018BORD0041⟩. ⟨tel-01816341⟩

Share

Metrics

Record views

318

Files downloads

182