Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience

Feng Li 1
1 Parkas - Parallélisme de Kahn Synchrone
CNRS - Centre National de la Recherche Scientifique : UMR 8548, Inria Paris-Rocquencourt, DI-ENS - Département d'informatique de l'École normale supérieure
Abstract : Across the wide range of multiprocessor architectures, all seem to share one common problem: they are hard to program. It is a general belief that parallelism is a software problem, and that perhaps we need more sophisticated compilation techniques to partition the application into concurrent threads. Many experts also make the point that the underlining architecture plays an equally important architecture before one may expect significant progress in the programmability of multiprocessors. Our approach favors a convergence of these viewpoints. The convergence of dataflow and von Neumann architecture promises latency tolerance, the exploitation of a high degree of parallelism, and light thread switching cost. Multithreaded dataflow architectures require a high degree of parallelism to tolerate latency. On the other hand, it is error-prone for programmers to partition the program into large number of fine grain threads. To reconcile these facts, we aim to advance the state of the art in automatic thread partitioning, in combination with programming language support for coarse-grain, functionally deterministic concurrency. This thesis presents a general thread partitioning algorithm for transforming sequential code into a parallel data-flow program targeting a multithreaded dataflow architecture. Our algorithm operates on the program dependence graph and on the static single assignment form, extracting task, pipeline, and data parallelism from arbitrary control flow, and coarsening its granularity using a generalized form of typed fusion. We design a new intermediate representation to ease code generation for an explicit token match dataflow execution model. We also implement a GCC-based prototype. We also evaluate coarse-grain dataflow extensions of OpenMP in the context of a large-scale 1024-core, simulated multithreaded dataflow architecture. These extension and simulated architecture allow the exploration of innovative memory models for dataflow computing. We evaluate these tools and models on realistic applications.
Document type :
Complete list of metadatas

Cited literature [78 references]  Display  Hide  Download
Contributor : Abes Star <>
Submitted on : Wednesday, September 10, 2014 - 11:12:09 AM
Last modification on : Thursday, February 7, 2019 - 1:32:36 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00992753, version 2


Feng Li. Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience. Other [cs.OH]. Université Pierre et Marie Curie - Paris VI, 2014. English. ⟨NNT : 2014PA066102⟩. ⟨tel-00992753v2⟩



Record views


Files downloads