An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm, Publications Conférences internationales [ 18th IEEE International Conference on Parallel and Distributed Systems, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-01282311
BatchQueue : Fast and Memory-thrifty Core to Core Communication BatchQueue : Efficient core-to-core communication for pipeline parallelism, 22nd International Symposium on Computer Architecture and High Performance Computing IEEE. Poster international [ASPLOS11] Thomas Preud'homme, Julien Sopena, Gaël Thomas, and Bertil Folliot, pp.215-222, 2010. ,
BatchQueue : file producteur/consommateur optimisée pour les multicoeurs, 8th Conférence Française en Systèmes d'Exploitation (CFSE'08), 2011. ,
The semantics of power and arm multiprocessor machine code, Proceedings of the 4th workshop on Declarative aspects of multicore programming, pp.13-24, 2009. ,
Laws of order : expensive synchronization in concurrent algorithms cannot be eliminated, POPL, pp.487-498, 2011. ,
Weak ordering -a new definition, Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.2-14, 1990. ,
Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967. ,
AMD64 Architecture Programmer's Manual Volume 2 : System Programming ,
Adrian Schüpbach, and Akhilesh Singhania . The multikernel : a new OS architecture for scalable multicore systems, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.29-44, 2009. ,
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors, ACM Transactions on Programming Languages and Systems, vol.5, issue.2, p.189, 1983. ,
DOI : 10.1145/69624.357206
FastForward for efficient pipeline parallelism, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, 2008. ,
DOI : 10.1145/1345206.1345215
Exploiting coarsegrained task, data, and pipeline parallelism in stream programs, ASPLOS- XII, pp.151-162, 2006. ,
Sealing os processes to improve dependability and safety, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems Euro- Sys '07, pp.341-354, 2007. ,
Robust critical data recovery for mpeg-4 aac encoded bitstreams, ICASSP, pp.397-400, 2010. ,
Computer Architecture -A Quantitative Approach, 2007. ,
The baskets queue. Principles of Distributed Systems, pp.401-414, 2007. ,
Array Building Blocks. http://software.intel.com/en-us/articles/ intel-array-building-blocks ,
Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus ,
Multiview and millipage-fine-grain sharing in page-based dsms. Operating systems review, pp.215-228, 1998. ,
Lazy release consistency for distributed shared memory, 1995. ,
Lazy release consistency for hardware-coherent multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '95, pp.61-61, 1995. ,
DOI : 10.1145/224170.224398
Inter-core prefetching for multicore processors using migrating helper threads, ASPLOS, pp.393-404, 2011. ,
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, vol.28, issue.9, pp.690-691, 1979. ,
DOI : 10.1109/TC.1979.1675439
Specifying Concurrent Program Modules, ACM Transactions on Programming Languages and Systems, vol.5, issue.2, pp.190-222, 1983. ,
DOI : 10.1145/69624.357207
A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010. ,
DOI : 10.1109/IPDPS.2010.5470368
Remote Core Locking : Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications, USENIX Annual Technical Confe- rence ,
URL : https://hal.archives-ouvertes.fr/hal-00991709
An optimistic approach to lock-free FIFO queues, Proceedings of Distributed Computing, pp.117-131, 2004. ,
Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, pp.82-85, 1998. ,
DOI : 10.1109/JPROC.1998.658762
Concurrent queues : Practical fetch-and-? algorithms, 1987. ,
Robust MAC-lite and soft header recovery for packetized multimedia transmission, IEEE Transactions on Communications, vol.58, issue.3, pp.775-784, 2010. ,
DOI : 10.1109/TCOMM.2010.03.080303
URL : https://hal.archives-ouvertes.fr/hal-00549101
Using elimination to implement scalable and lock-free FIFO queues, Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures , SPAA'05, p.262, 2005. ,
DOI : 10.1145/1073970.1074013
Memory consistency models, ACM SIGOPS Operating Systems Review, vol.27, issue.1, pp.18-26, 1993. ,
Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors, Journal of Parallel and Distributed Computing, vol.51, issue.1, pp.1-26, 1998. ,
DOI : 10.1006/jpdc.1998.1446
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling, 2009 IEEE International Conference on Computer Design, pp.282-288, 2009. ,
DOI : 10.1109/ICCD.2009.5413143
A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pp.5-14, 2011. ,
DOI : 10.1145/1944862.1944867
URL : https://hal.archives-ouvertes.fr/hal-00659411
Non-blocking algorithms for concurrent data structures, 1991. ,
A nonblocking algorithm for shared queues using compare-and-swap, IEEE Transactions on Computers, vol.43, issue.5, pp.548-559, 1994. ,
DOI : 10.1109/12.280802
The semantics of x86-CC multiprocessor machine code, ACM SIGPLAN Notices, vol.44, issue.1, pp.379-391, 2009. ,
DOI : 10.1145/1594834.1480929
The free lunch is over : A fundamental turn toward concurrency in software, Dr. Dobb's Journal, vol.30, issue.3, pp.202-210, 2005. ,
The Helios Tuple Space Library, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing, pp.325-331, 1994. ,
DOI : 10.1109/EMPDP.1994.592509
A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems, Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '01, p.143, 2001. ,
DOI : 10.1145/378580.378611
Implementing lock-free queues, Proceedings of the Seventh International Conference on Parallel and Distributed Computing Systems, pp.64-69, 1994. ,
Factored operating systems (fos), ACM SIGOPS Operating Systems Review, vol.43, issue.2, pp.76-85, 2009. ,
DOI : 10.1145/1531793.1531805
The History of the Development of Parallel Computing, 1994. ,
StreamIt: A Language for Streaming Applications, International Conference on Compiler Construction, 2002. ,
DOI : 10.1007/3-540-45937-5_14
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection, International Symposium on Code Generation and Optimization (CGO'07), pp.244-258, 2007. ,
DOI : 10.1109/CGO.2007.7
Clustered Communication for Efficient Pipelined Multithreading on Commodity MCPs, IAENG International Journal of Computer Science, vol.36, 2009. ,