I. , T. Preud-'homme, J. Sopena, G. Thomas, and B. Folliot, An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm, Publications Conférences internationales [ 18th IEEE International Conference on Parallel and Distributed Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01282311

[. Preud-'homme, J. Sopena, G. Thomas, and B. Folliot, BatchQueue : Fast and Memory-thrifty Core to Core Communication BatchQueue : Efficient core-to-core communication for pipeline parallelism, 22nd International Symposium on Computer Architecture and High Performance Computing IEEE. Poster international [ASPLOS11] Thomas Preud'homme, Julien Sopena, Gaël Thomas, and Bertil Folliot, pp.215-222, 2010.

J. Conférence-française-[-cfse11-]-thomas-preud-'homme, G. Sopena, B. Thomas, and . Folliot, BatchQueue : file producteur/consommateur optimisée pour les multicoeurs, 8th Conférence Française en Systèmes d'Exploitation (CFSE'08), 2011.

A. Références, ]. J. Afi-`-09afi-`-afi-`-09, A. Alglave, S. Fox, M. O. Ishtiaq et al., The semantics of power and arm multiprocessor machine code, Proceedings of the 4th workshop on Declarative aspects of multicore programming, pp.13-24, 2009.

R. Agh-`-11agh-`-agh-`-11-]-hagit-attiya, D. Guerraoui, P. Hendler, M. M. Kuznetsov, M. T. Michael et al., Laws of order : expensive synchronization in concurrent algorithms cannot be eliminated, POPL, pp.487-498, 2011.

S. Adve and M. D. Hill, Weak ordering -a new definition, Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.2-14, 1990.

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967.

[. Amd, AMD64 Architecture Programmer's Manual Volume 2 : System Programming

B. Baumann, P. Barham, P. Dagand, T. Harris, R. Isaacs et al., Adrian Schüpbach, and Akhilesh Singhania . The multikernel : a new OS architecture for scalable multicore systems, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.29-44, 2009.

A. Gottlieb, B. D. Lubachevsky, and L. Rudolph, Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors, ACM Transactions on Programming Languages and Systems, vol.5, issue.2, p.189, 1983.
DOI : 10.1145/69624.357206

J. Giacomoni, T. Mosely, and M. Vachharajani, FastForward for efficient pipeline parallelism, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, 2008.
DOI : 10.1145/1345206.1345215

I. Michael, W. Gordon, S. Thies, and . Amarasinghe, Exploiting coarsegrained task, data, and pipeline parallelism in stream programs, ASPLOS- XII, pp.151-162, 2006.

H. Haf-`-07haf-`-haf-`-07-]-galen, M. Hunt, M. Aiken, C. Fähndrich, O. Hawblitzel et al., Sealing os processes to improve dependability and safety, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems Euro- Sys '07, pp.341-354, 2007.

. Hhk-`-10hhk-`-hhk-`-10-]-ruijing, X. Hu, M. Huang, O. Kieffer, P. Derrien et al., Robust critical data recovery for mpeg-4 aac encoded bitstreams, ICASSP, pp.397-400, 2010.

L. John, D. A. Hennessy, and . Patterson, Computer Architecture -A Quantitative Approach, 2007.

M. Hoffman, O. Shalev, and N. Shavit, The baskets queue. Principles of Distributed Systems, pp.401-414, 2007.

I. Intel, Array Building Blocks. http://software.intel.com/en-us/articles/ intel-array-building-blocks

. Intb and . Intel, Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus

A. Itzkovitz and A. Schuster, Multiview and millipage-fine-grain sharing in page-based dsms. Operating systems review, pp.215-228, 1998.

K. Keleher, Lazy release consistency for distributed shared memory, 1995.

L. I. Kontothanassis, M. L. Scott, and R. Bianchini, Lazy release consistency for hardware-coherent multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '95, pp.61-61, 1995.
DOI : 10.1145/224170.224398

M. Kamruzzaman, S. Swanson, and D. M. Tullsen, Inter-core prefetching for multicore processors using migrating helper threads, ASPLOS, pp.393-404, 2011.

L. Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, vol.28, issue.9, pp.690-691, 1979.
DOI : 10.1109/TC.1979.1675439

L. Lamport, Specifying Concurrent Program Modules, ACM Transactions on Programming Languages and Systems, vol.5, issue.2, pp.190-222, 1983.
DOI : 10.1145/69624.357207

P. P. Lee, T. Bu, and G. Chandranmenon, A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470368

[. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller, Remote Core Locking : Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications, USENIX Annual Technical Confe- rence
URL : https://hal.archives-ouvertes.fr/hal-00991709

N. [. Ladan-mozes and . Shavit, An optimistic approach to lock-free FIFO queues, Proceedings of Distributed Computing, pp.117-131, 2004.

M. E. Moore, Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, pp.82-85, 1998.
DOI : 10.1109/JPROC.1998.658762

J. M. Mellor-crummey, Concurrent queues : Practical fetch-and-? algorithms, 1987.

C. Marin, Y. Leprovost, M. Kieffer, and P. Duhamel, Robust MAC-lite and soft header recovery for packetized multimedia transmission, IEEE Transactions on Communications, vol.58, issue.3, pp.775-784, 2010.
DOI : 10.1109/TCOMM.2010.03.080303

URL : https://hal.archives-ouvertes.fr/hal-00549101

M. Moir, D. Nussbaum, O. Shalev, and N. Shavit, Using elimination to implement scalable and lock-free FIFO queues, Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures , SPAA'05, p.262, 2005.
DOI : 10.1145/1073970.1074013

]. D. Mos93 and . Mosberger, Memory consistency models, ACM SIGOPS Operating Systems Review, vol.27, issue.1, pp.18-26, 1993.

M. M. Michael and M. L. Scott, Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors, Journal of Parallel and Distributed Computing, vol.51, issue.1, pp.1-26, 1998.
DOI : 10.1006/jpdc.1998.1446

J. Meng and K. Skadron, Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling, 2009 IEEE International Conference on Computer Design, pp.282-288, 2009.
DOI : 10.1109/ICCD.2009.5413143

A. Pop and A. Cohen, A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pp.5-14, 2011.
DOI : 10.1145/1944862.1944867

URL : https://hal.archives-ouvertes.fr/hal-00659411

S. Prakash, Y. Lee, and T. Johnson, Non-blocking algorithms for concurrent data structures, 1991.

Y. [. Prakash, T. Lee, and . Johnson, A nonblocking algorithm for shared queues using compare-and-swap, IEEE Transactions on Computers, vol.43, issue.5, pp.548-559, 1994.
DOI : 10.1109/12.280802

S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge et al., The semantics of x86-CC multiprocessor machine code, ACM SIGPLAN Notices, vol.44, issue.1, pp.379-391, 2009.
DOI : 10.1145/1594834.1480929

H. Sutter, The free lunch is over : A fundamental turn toward concurrency in software, Dr. Dobb's Journal, vol.30, issue.3, pp.202-210, 2005.

L. M. Silva, B. Veer, and J. G. Silva, The Helios Tuple Space Library, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing, pp.325-331, 1994.
DOI : 10.1109/EMPDP.1994.592509

P. Tsigas and Y. Zhang, A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems, Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '01, p.143, 2001.
DOI : 10.1145/378580.378611

J. D. Valois, Implementing lock-free queues, Proceedings of the Seventh International Conference on Parallel and Distributed Computing Systems, pp.64-69, 1994.

D. Wentzlaff and A. Agarwal, Factored operating systems (fos), ACM SIGOPS Operating Systems Review, vol.43, issue.2, pp.76-85, 2009.
DOI : 10.1145/1531793.1531805

V. Gregory and . Wilson, The History of the Development of Parallel Computing, 1994.

W. Thies, M. Karczmarek, and S. Amarasinghe, StreamIt: A Language for Streaming Applications, International Conference on Compiler Construction, 2002.
DOI : 10.1007/3-540-45937-5_14

C. Wang, H. Kim, Y. Wu, and V. Ying, Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection, International Symposium on Code Generation and Optimization (CGO'07), pp.244-258, 2007.
DOI : 10.1109/CGO.2007.7

Z. Zhang, K. Ootsu, T. Yokota, and T. Baba, Clustered Communication for Efficient Pipelined Multithreading on Commodity MCPs, IAENG International Journal of Computer Science, vol.36, 2009.