F. Li, A. Pop, and A. Cohen, Automatic Extraction of Coarse-Grained Data-Flow Threads from Imperative Programs, IEEE Micro, vol.32, issue.4, p.32, 1931.
DOI : 10.1109/MM.2012.49
URL : https://hal.archives-ouvertes.fr/hal-00906099

F. Li, B. Arnoux, and A. Cohen, A Compiler and Runtime System Perspective to Scalable Data-Flow Computing, 5th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00786831

F. Li, P. Antoniu, and A. Cohen, Advances in Parallel-Stage Decoupled Software Pipelining, Workshop on Intermediate Representations (WIR), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00870687

M. Solinas, R. M. Badia, F. Bodin, A. Cohen, P. Evripidou et al., The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices, 2013 Euromicro Conference on Digital System Design, 2013.
DOI : 10.1109/DSD.2013.39
URL : https://hal.archives-ouvertes.fr/hal-00920903

K. Trifunovic, A. Cohen, L. Razya, and F. Li, Elimination of memorybased dependences for loop-nest optimization and parallelization, GCC Research Opportunities Workshop (GROW11), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00992740

K. Trifunovic, A. Cohen, D. Edelsohn, F. Li, T. Grosser et al., GRAPHITE Two Years After: First Lessons Learned From Real-World Polyhedral Compilation, GCC Research Opportunities Workshop (GROW10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551516

R. Allen, R. Kennedy, and K. , Automatic translation of FORTRAN programs to vector form, ACM Transactions on Programming Languages and Systems, vol.9, issue.4, pp.491-542, 1987.
DOI : 10.1145/29873.29875

A. W. Appel, SSA is functional programming, ACM SIGPLAN Notices, vol.33, issue.4, pp.17-20, 1998.
DOI : 10.1145/278283.278285
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.3282

&. Arvind and D. E. Culler, Annual review of computer science, Dataflow Architectures, pp.225-253, 1986.

&. Arvind and D. E. Culler, Dataflow Architectures, Annual Review of Computer Science, vol.1, issue.1, pp.225-253, 1986.
DOI : 10.1146/annurev.cs.01.060186.001301

A. Arvind and D. Culler, The tagged token dataflow architecture (preliminary version), Tech. rep., Tech. Rep. Laboratory for Computer Science, 1983.

A. Arvind, K. P. Gostelow, and W. Plouffe, Indeterminacy, monitors, and dataflow, ACM SIGOPS Operating Systems Review, vol.11, issue.5, pp.159-169, 1977.
DOI : 10.1145/1067625.806559

K. Arvind and R. S. Nikhil, Executing a program on the MIT tagged-token dataflow architecture, IEEE Transactions on Computers, vol.39, issue.3, pp.300-318, 1990.
DOI : 10.1109/12.48862

J. Backus, Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs, Communications of the ACM, vol.21, issue.8, pp.613-641, 1978.
DOI : 10.1145/359576.359579

W. Baxter, H. R. Bauer, and . Iii, The program dependence graph and vectorization, Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '89, pp.1-11, 1989.
DOI : 10.1145/75277.75278

M. Beck, R. Johnson, and K. Pingali, From control flow to dataflow, Journal of Parallel and Distributed Computing, vol.12, issue.2, 1989.
DOI : 10.1016/0743-7315(91)90016-3

C. Böhm and G. Jacopini, Flow diagrams, turing machines and languages with only two formation rules, Communications of the ACM, vol.9, issue.5, pp.366-371, 1966.
DOI : 10.1145/355592.365646

D. E. Culler and . Arvind, Resource requirements of dataflow programs, ACM SIGARCH Computer Architecture News, vol.16, issue.2, pp.141-150, 1988.
DOI : 10.1145/633625.52417

D. E. Culler, A. Sah, K. E. Schauser, T. Von-eicken, and J. Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, ACM SIGARCH Computer Architecture News, vol.19, issue.2, pp.164-175, 1991.
DOI : 10.1145/106975.106990

R. Cytron, Doacross: Beyond vectorization for multiprocessors, Intl. Conf. on Parallel Processing (ICPP), 1986.

R. Cytron, J. Ferrante, and V. Sarkar, Experiences using control dependence in PTRAN, Selected papers of the second workshop on Languages and compilers for parallel computing, pp.186-212, 1990.

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, pp.451-490, 1991.
DOI : 10.1145/115372.115320

A. Davis and R. Keller, Data Flow Program Graphs, Computer, vol.15, issue.2, pp.26-41, 1982.
DOI : 10.1109/MC.1982.1653939
URL : http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=1285&context=hmc_fac_pub

J. Dennis and D. Misunas, Data processing apparatus for highly parallel execution of stored programs, US Patent, vol.4153, p.932, 1979.

J. Dennis, J. Fosseen, and J. Linderman, Data flow schemas, International Symposium on Theoretical Programming, pp.187-216, 1974.
DOI : 10.1007/3-540-06720-5_15

J. B. Dennis and G. R. Gao, An efficient pipelined dataflow processor architecture, Proceedings. SUPERCOMPUTING '88, pp.368-373, 1988.
DOI : 10.1109/SUPERC.1988.44674

J. B. Dennis and D. P. Misunas, A preliminary architecture for a basic data-flow processor, Proceedings of the 2nd annual symposium on Computer architecture, ISCA '75, pp.126-132, 1975.

J. Ferrante and M. Mace, On linearizing parallel code, Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages , POPL '85, pp.179-190, 1985.
DOI : 10.1145/318593.318636

J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1145/24039.24041

J. Ferrante, M. Mace, and B. Simons, Generating sequential code from parallel code, Proceedings of the 2nd international conference on Supercomputing , ICS '88, pp.582-592, 1988.
DOI : 10.1145/55364.55421

D. Gajski, D. Padua, D. Kuck, and R. Kuhn, A Second Opinion on Data Flow Machines and Languages, Computer, vol.15, issue.2, 1982.
DOI : 10.1109/MC.1982.1653942

J. L. Gaudiot and Y. H. Wei, Token relabeling in a tagged token dataflow architecture. Computers, IEEE Transactions on, vol.38, pp.1225-1239, 1989.

F. Gindrand, A. Cohen, and F. Z. Nardelli, Definition, code generation , and formal verification of a software controlled cache coherence protocol, 2013.

R. Giorgi, Z. Popovic, and N. Puzovic, DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07), pp.263-270, 2007.
DOI : 10.1109/SBAC-PAD.2007.27

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud, The synchronous data flow programming language LUSTRE, Proceedings of the IEEE, vol.79, issue.9, pp.1305-1320, 1991.
DOI : 10.1109/5.97300

L. Hendren, X. Tang, Y. Zhu, S. Ghobrial, G. Gao et al., Compiling C for the EARTH multithreaded architecture, International Journal of Parallel Programming, vol.3, issue.6, pp.305-338, 1997.
DOI : 10.1007/BF02699905

R. A. Iannucci, Toward a dataflow/von Neumann hybrid architecture, ACM SIGARCH Computer Architecture News, vol.16, issue.2, pp.131-140, 1988.
DOI : 10.1145/633625.52416

G. Kahn, The semantics of a simple language for parallel programming, pp.471-475, 1974.

R. Karp and R. Miller, Properties of a Model for Parallel Computations: Determinacy, Termination, Queueing, SIAM Journal on Applied Mathematics, vol.14, issue.6, pp.1390-1411, 1966.
DOI : 10.1137/0114108

R. A. Kelsey, A correspondence between continuation passing style and static single assignment form, Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations, IR '95, pp.13-22, 1995.

K. Kennedy and J. R. Allen, Optimizing compilers for modern architectures: a dependence-based approach, 2002.

K. Kennedy and K. S. Mckinley, Loop distribution with arbitrary control flow, Proceedings SUPERCOMPUTING '90, pp.407-416, 1990.
DOI : 10.1109/SUPERC.1990.130048

K. Kennedy and K. S. Mckinley, Typed fusion with applications to parallel and sequential code generation, 1993.

D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '81, pp.207-218, 1981.
DOI : 10.1145/567532.567555

B. Lee and A. Hurson, Dataflow architectures and multithreading, Computer, vol.27, issue.8, pp.27-39, 1994.
DOI : 10.1109/2.303620
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6909

F. Li, P. Antoniu, and A. Cohen, Advances in Parallel-Stage Decoupled Software Pipelining, Workshop on Intermediate Representations (WIR), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00870687

F. Li, A. Pop, and A. Cohen, Automatic Extraction of Coarse-Grained Data-Flow Threads from Imperative Programs, IEEE Micro, vol.32, issue.4, pp.19-31, 2012.
DOI : 10.1109/MM.2012.49
URL : https://hal.archives-ouvertes.fr/hal-00906099

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss et al., POSH, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.158-167, 1996.
DOI : 10.1145/1122971.1122997

W. Najjar, L. Roh, and A. Wim-bhm, An evaluation of medium-grain dataflow code, International Journal of Parallel Programming, vol.1, issue.2, pp.209-242, 1994.
DOI : 10.1007/BF02577733

R. S. Nikhil, Can dataflow subsume von Neumann computing?, ACM SIGARCH Computer Architecture News, vol.17, issue.3, pp.262-272, 1989.
DOI : 10.1145/74926.74955
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.472.9660

R. S. Nikhil, G. M. Papadopoulos, and . Arvind, T, ACM SIGARCH Computer Architecture News, vol.20, issue.2, pp.156-167, 1992.
DOI : 10.1145/146628.139715

D. Nuzman and R. Henderson, Multi-platform Auto-vectorization, International Symposium on Code Generation and Optimization (CGO'06), pp.281-294, 2006.
DOI : 10.1109/CGO.2006.25

D. Nuzman, I. Rosen, and A. Zaks, Auto-vectorization of interleaved data for simd, Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pp.132-143, 2006.

K. J. Ottenstein, R. A. Ballance, and A. B. Maccabe, The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages, Proc. of the ACM SIGPLAN 1990 Conf. on Programming Language Design and Implementation, PLDI '90, pp.257-271, 1990.

G. Ottoni, R. Rangan, A. Stoler, D. I. August, . Usa et al., Automatic Thread Extraction with Decoupled Software Pipelining Monsoon: An explicit tokenstore architecture, IEEE/ACM Intl. Symp. on Microarchitecture IEEE Computer Society, vol.0, issue.18, pp.105-118, 1990.

G. M. Papadopoulos and K. R. Traub, Multithreading: A revisionist view of dataflow architectures, Proceedings of the 18th Annual International Symposium on Computer Architecture, ISCA '91, pp.342-351, 1991.

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

A. Pop and A. Cohen, A Stream-Comptuting Extension to OpenMP, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00551507

A. Pop and A. Cohen, A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, 2011.
DOI : 10.1145/1944862.1944867
URL : https://hal.archives-ouvertes.fr/hal-00659411

A. Pop and A. Cohen, OpenStream, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5325, 2013.
DOI : 10.1145/2400682.2400712
URL : https://hal.archives-ouvertes.fr/hal-00786675

A. Pop, S. Pop, and J. Sjödin, Automatic Streamization in GCC, GCC Developer's Summit, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00817455

A. Portero, Z. Yu, and R. Giorgi, T-Star (T*): An x86-64 ISA extension to support thread execution on many cores, HiPEAC ACACES- 2011, pp.277-280, 2011.

E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August, Parallel-stage decoupled software pipelining, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.114-123, 2008.
DOI : 10.1145/1356058.1356074
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.8168

J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss et al., Tasking with out-of-order spawn in TLS chip multiprocessors, Proceedings of the 19th annual international conference on Supercomputing , ICS '05, pp.179-188, 2005.
DOI : 10.1145/1088149.1088173

L. Roh and W. A. Najjar, Design of storage hierarchy in multithreaded architectures, Proceedings of the 28th Annual International Symposium on Microarchitecture, pp.271-278, 1995.
DOI : 10.1109/MICRO.1995.476834

V. Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, 1989.

K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou et al., TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems, 2008 37th International Conference on Parallel Processing, pp.25-34, 2008.
DOI : 10.1109/ICPP.2008.74

J. Strohschneider and K. Waldschmidt, ADARC: a fine grain dataflow architecture with associative communication network, Proceedings of Twentieth Euromicro Conference. System Architecture and Integration, pp.445-450, 1994.
DOI : 10.1109/EURMIC.1994.390372

X. Tang and G. R. Gao, Automatically Partitioning Threads for Multithreaded Architectures, Journal of Parallel and Distributed Computing, vol.58, issue.2, pp.159-189, 1999.
DOI : 10.1006/jpdc.1999.1551

X. Tang, J. Wang, K. B. Theobald, and G. R. Gao, Thread partitioning and scheduling based on cost model, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures , SPAA '97, pp.272-281, 1997.
DOI : 10.1145/258492.258519

K. Trifunovic, A. Cohen, L. Razya, and F. Li, Elimination of memory-based dependences for loop-nest optimization and parallelization, GCC Research Opportunities Workshop (GROW'11), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00992740

P. Tu and D. Padua, Gated SSA-based demand-driven symbolic analysis for parallelizing compilers, Proceedings of the 9th international conference on Supercomputing , ICS '95, pp.414-423, 1995.
DOI : 10.1145/224538.224648

A. H. Veen, Dataflow machine architecture, ACM Computing Surveys, vol.18, issue.4, pp.365-396, 1986.
DOI : 10.1145/27633.28055

P. Viola and M. Jones, Robust real-time object detection, In International Journal of Computer Vision, 2001.

I. Watson and J. Gurd, A prototype data flow computer with token labelling. Managing Requirements Knowledge, p.623, 1979.

M. J. Wolfe, High Performance Compilers for Parallel Computing, 1995.