V. Free, 67 12 Verify if record is zombie, 68 13 Update and release with indexes 14 Minimum computation with indexes wraparound . . . . . . . . . . . . . . . . 71 15 Minimum record view index computation with compare and swap, p.74

C. Update and .. Primitives-using-lazy-waiting, 77 17 Mutual exclusive minimum computation update section with index lazy waiting 77 18 DMA transfers based on transfer primitive calls and minimum index integration 92

A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: principles, techniques, and tools, pp.99-113, 1986.

G. Al-kadi and A. Terechko, A Hardware Task Scheduler for Embedded Video Processing, Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC'09), p.19, 2009.
DOI : 10.1007/978-3-540-92990-1_12

URL : https://hal.archives-ouvertes.fr/inria-00445874

M. Aldinucci, M. Meneghin, and M. Torquati, Efficient Smith-Waterman on Multi-core with FastFlow, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.195-199, 2010.
DOI : 10.1109/PDP.2010.93

A. W. Appel, Modern Compiler Implementation in C, p.149, 1998.
DOI : 10.1017/CBO9781139174930

R. S. Arvind, K. Nikhil, and . Pingali, I-structures: data structures for parallel computing, ACM Transactions on Programming Languages and Systems, vol.11, issue.4, pp.598-632, 1989.
DOI : 10.1145/69558.69562

H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. M. Michael et al., Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated, Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL'11), pp.487-498, 2011.

C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis, Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System, Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09), pp.329-339, 2009.
DOI : 10.1007/978-3-642-03138-0_36

URL : https://hal.archives-ouvertes.fr/inria-00378705

A. Azevedo, C. Meenderinck, B. Juurlink, A. T. Hoogerbrugge, M. Alvarez et al., Parallel H.264 Decoding on an Embedded Multicore Processor, Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC'09), p.80, 2009.
DOI : 10.1007/978-3-540-92990-1_29

URL : https://hal.archives-ouvertes.fr/inria-00446428

W. Baek, C. C. Minh, M. Trautmann, C. Kozyrakis, and K. Olukotun, The OpenTM Transactional Application Programming Interface, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.376-387, 2007.
DOI : 10.1109/PACT.2007.4336227

G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete, Cyclo-static data flow, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.3255-3258, 1995.
DOI : 10.1109/ICASSP.1995.479579

R. Openmp-architecture and . Board, OpenMP Application Program Interface, pp.95-117, 2011.

-. J. Hans, S. V. Boehm, and . Adve, Foundations of the C++ concurrency memory model, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation (PLDI'08), pp.68-78, 2008.

P. Carpenter, D. Ródenas, X. Martorell, A. Ramírez, and E. Ayguadé, A Streaming Machine Description and Programming Model, Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'07), pp.107-116, 2007.
DOI : 10.1007/978-3-540-73625-7_13

P. Caspi and M. Pouzet, Synchronous Kahn Networks, Proceedings of the first ACM SIGPLAN international conference on Functional programming (ICFP'96), pp.226-238, 1996.
DOI : 10.1145/232627.232651

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9168

A. Cohen, L. Mandel, F. Plateau, and M. Pouzet, Abstraction of Clocks in Synchronous Data-Flow Systems, Proceedings of the 6th Asian Symposium on Programming Languages and Systems (APLAS' 08), 2008.
DOI : 10.1016/0167-6423(91)90001-E

URL : https://hal.archives-ouvertes.fr/hal-01257274

I. Corp, Occam Programming Manual, 1984.

E. David and A. Culler, Resource requirements of dataflow programs, Proceedings of the 15th Annual International Symposium on Computer architecture (ISCA'88), pp.141-150, 1988.

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, An efficient method of computing static single assignment form, Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '89, pp.25-35, 1989.
DOI : 10.1145/75277.75280

B. Jack, G. R. Dennis, and . Gao, An efficient pipelined dataflow processor architecture, Supercomputing (SC'88), pp.368-373, 1988.

U. Drepper, What every programmer should know about memory, 2007.

U. Drepper, Futexes are Tricky, p.72, 2009.

M. Fomitchev and E. Ruppert, Lock-free linked lists and skip lists, Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing , PODC '04, pp.50-59, 2004.
DOI : 10.1145/1011767.1011776

C. Fournet and G. Gonthier, The Reflexive Chemical Abstract Machine and the Join-Calculus, Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL '96), pp.372-385, 1996.

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI'98), pp.212-223, 1998.

J. Giacomoni, T. Moseley, and M. Vachharajani, FastForward for efficient pipeline parallelism, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.43-52, 2008.
DOI : 10.1145/1345206.1345215

I. Michael, W. Gordon, S. Thies, and . Amarasinghe, Exploiting coarsegrained task, data, and pipeline parallelism in stream programs, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII), pp.151-162, 2006.

R. Gupta and S. Lee, Exploiting parallelism on a fine-grained MIMD architecture based upon channel queues, International Journal of Parallel Programming, vol.2, issue.3, pp.169-192, 1992.
DOI : 10.1007/BF01408554

W. Haid, L. Schor, K. Huang, I. Bacivarov, and L. Thiele, Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs, 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia, pp.35-44, 2009.
DOI : 10.1109/ESTMED.2009.5336828

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud, The synchronous data flow programming language LUSTRE, Proceedings of the IEEE, pp.1305-1320, 1991.
DOI : 10.1109/5.97300

H. Robert and . Halstead-jr, MULTILISP: a language for concurrent symbolic computation, ACM Transactions on Programming Languages and Systems, vol.7, issue.18, pp.501-538, 1985.

N. Heintze and O. Tardieu, Ultra-fast aliasing analysis using CLA: a million lines of C code in a second, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation (PLDI '01), pp.254-263, 2001.

T. Henriksson, P. Van, and . Wolf, TTL Hardware Interface: A High-Level Interface for Streaming Multiprocessor Architectures, 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia, pp.107-112, 2006.
DOI : 10.1109/ESTMED.2006.321282

C. A. Hoare, Communicating Sequential Processes, 1985.

H. and P. Hofstee, Power Efficient Processor Architecture and The Cell Processor, 11th International Symposium on High-Performance Computer Architecture, pp.258-262
DOI : 10.1109/HPCA.2005.26

G. Kahn, The semantics of a simple language for parallel programming, Information processing, pp.471-475, 1974.

R. Kennedy, S. Chan, S. Liu, R. Lo, P. Tu et al., Partial redundancy elimination in SSA form, ACM Transactions on Programming Languages and Systems, vol.21, issue.3, pp.627-676, 1999.
DOI : 10.1145/319301.319348

C. Kim, J. Gaudiot, and W. Proskurowski, Parallel Computing with the Sisal Applicative Language: Programmability and Performance Issues. Software , Practice and Experience, AID-SPE483.0.CO;2-H, pp.1025-105191025, 1996.

J. Knoop, O. Rüthing, and B. Steffen, Lazy code motion, ACM SIGPLAN Notices, vol.39, issue.4, pp.460-472, 2004.
DOI : 10.1145/989393.989439

C. Kyriacou, P. Evripidou, and P. Trancoso, Data-Driven Multithreading Using Conventional Microprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.17, issue.10, pp.1176-1188, 2006.
DOI : 10.1109/TPDS.2006.136

E. Ashford, L. , and D. G. Messerschmitt, Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing, IEEE Transactions on Computers, vol.36, issue.1, pp.24-35, 1987.

E. Ashford, L. , and A. Sangiovanni-vincentelli, A framework for comparing models of computation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.17, issue.12, pp.1217-1229, 1998.

V. Marjanovic, J. Labarta, E. Ayguadé, and M. Valero, Effective communication and computation overlap with hybrid MPI/SMPSs, Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10), p.117, 2010.

E. Paul and . Mckenney, Memory ordering in modern microprocessors, Part I, Linux Journal, vol.5, issue.136 6, 2005.

M. Maged and . Michael, High performance dynamic lock-free hash tables and list-based sets, Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures (SPAA'02), pp.73-82, 2002.

R. Milner, J. Parrow, and D. Walker, A Calculus of Mobile Processes, I and II. Information and Computation, pp.1-40, 1992.

C. Miranda, P. Dumont, A. Cohen, M. Duranton, and A. Pop, ERBIUM, Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pp.119-120, 2010.
DOI : 10.1145/1787275.1787312

URL : https://hal.archives-ouvertes.fr/inria-00551510

C. Miranda, A. Pop, P. Dumont, A. Cohen, and M. Duranton, Erbium, Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, CASES '10, pp.11-20, 2010.
DOI : 10.1145/1878921.1878924

URL : https://hal.archives-ouvertes.fr/inria-00551510

M. Olszewski, J. Ansel, and S. Amarasinghe, Kendo: Efficient Deterministic Multithreading in Software, Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09), pp.97-108, 2009.

G. Ottoni, R. Rangan, A. Stoler, and D. I. August, Automatic Thread Extraction with Decoupled Software Pipelining, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), pp.105-118, 2005.
DOI : 10.1109/MICRO.2005.13

S. Owens, S. Sarkar, and P. Sewell, A Better x86 Memory Model: x86-TSO, Theorem Proving in Higher Order Logics, pp.391-407, 2009.
DOI : 10.1007/11817963_46

D. J. Pearce, P. H. Kelly, and C. Hankin, Efficient field-sensitive pointer analysis of C, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.30, issue.1, 0102.

M. Josep, P. Pérez, R. M. Bellens, J. Badia, and . Labarta, CellSs: Making it easier to program the Cell Broadband Engine processor, IBM Journal of Research and Development, vol.51, issue.5, pp.593-604, 2007.

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

A. Pop and A. Cohen, Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers, Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC'10)
URL : https://hal.archives-ouvertes.fr/inria-00551518

A. Pop and A. Cohen, A Stream-Comptuting Extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC'11), pp.5-14, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00551507

A. Pop, S. Pop, P. H. Harsha-jagasia-sjödin, and . Kelly, Improving GNU Compiler Collection Infrastructure for Streamization, Proceedings of the 2008 GCC Developers' Summit, pp.77-86, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00817445

A. Pop, S. Pop, and J. Sjödin, Automatic Streamization in GCC, Proceedings of the 2009 GCC Developer's Summit, p.117, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00817455

S. Pop, Analysis of induction variables using chains of recurrences: extensions. Master's thesis, 0154.

S. Pop, P. Clauss, A. Cohen, V. Loechner, and G. Silber, Fast recognition of scalar evolutions on three-address SSA code, p.154, 2004.

S. Pop, A. Cohen, and G. Silber, Induction Variable Analysis with Delayed Abstractions, High Performance Embedded Architectures and Compilers, pp.218-232, 2005.
DOI : 10.1007/11587514_15

URL : https://hal.archives-ouvertes.fr/hal-01257294

S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Silber et al., GRAPHITE: Loop optimizations based on the polyhedral model for GCC, Proceedings of the 4th GCC Developper's Summit, pp.179-198, 0104.
URL : https://hal.archives-ouvertes.fr/hal-01257284

C. Martin, M. S. Rinard, and . Lam, The design, implementation and evaluation of Jade, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.20, issue.3, pp.483-545, 1998.

S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge et al., The semantics of x86-cc multiprocessor machine code, Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL '09, pp.379-391, 2009.

M. Själander, A. Terechko, and M. Duranton, A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, 2008.
DOI : 10.1109/DSD.2008.45

K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou et al., TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems, 2008 37th International Conference on Parallel Processing, pp.25-34, 2008.
DOI : 10.1109/ICPP.2008.74

R. Stephens, A survey of stream processing, Acta Informatica, vol.34, issue.7, pp.491-541, 1997.
DOI : 10.1007/s002360050095

S. Stuijk, Concurrency in Computational Networks Master's thesis, p.80, 2002.

A. S. Tanenbaum, Modern operating systems, 2001.

W. Thies and S. Amarasinghe, An empirical characterization of stream programs and its implications for language and compiler design, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, p.80, 2010.
DOI : 10.1145/1854273.1854319

W. Thies, M. Karczmarek, and S. Amarasinghe, StreamIt: A Language for Streaming Applications, Proceedings of the 11th International Conference on Compiler Construction (CC'02), pp.179-196, 2002.
DOI : 10.1007/3-540-45937-5_14

J. D. Valois, Lock-free linked lists using compare-and-swap, Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , PODC '95, pp.214-222, 1995.
DOI : 10.1145/224964.224988

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.9506

T. Vandrunen, Partial Redundancy Elimination for Global Value Numbering, 0149.

T. Vandrunen and A. L. Hosking, Value-Based Partial Redundancy Elimination, Compiler Construction, 13th International Conference, pp.167-184, 2004.
DOI : 10.1007/978-3-540-24723-4_12

I. Watson and J. R. Gurd, A Practical Data Flow Computer, Computer, vol.15, issue.2, pp.51-57, 1982.
DOI : 10.1109/MC.1982.1653941