, Memorax, a precise and sound tool for automatic fence insertion under TSO, Tools and Algorithms for the Construction and Analysis of Systems -19th International Conference, TACAS 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, pp.530-536, 2013.

T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy, Scheduler activations: Effective kernel support for the user-level management of parallelism, SOSP '91

M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, FastFlow: high-level and efficient streaming on multi-core, Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13, 2008.

J. Alglave, C. J. Anthony, S. Fox, M. O. Ishtiaq, S. Myreen et al., The semantics of power and ARM multiprocessor machine code, Proceedings of the 4th workshop on Declarative aspects of multicore programming, DAMP '09, pp.13-24, 2009.
DOI : 10.1145/1481839.1481842

H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. M. Michael et al., Laws of order: Expensive synchronization in concurrent algorithms cannot be eliminated, Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.487-498, 2011.

V. Sarita, M. D. Adve, and . Hill, Weak ordering -a new definition, ISCA, 1990.

J. Alglave, D. Kroening, V. Nimal, and D. Poetzl, Don't sit on the fence -A static analysis approach to automatic fence insertion, Computer Aided Verification -26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic. Proceedings, pp.508-524, 2014.

J. Alglave, D. Kroening, and M. Tautschnig, Partial Orders for Efficient Bounded Model Checking of??Concurrent??Software, Computer Aided Verification -25th International Conference, CAV 2013 Proceedings, pp.141-157, 1998.
DOI : 10.1007/978-3-642-39799-8_9

J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, Fences in Weak Memory Models, Computer Aided Verification, 22nd International Conference, CAV 2010 Proceedings, pp.258-272, 2010.
DOI : 10.1007/978-3-642-14295-6_25

URL : https://hal.archives-ouvertes.fr/hal-01100859

J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, Litmus: Running Tests against Hardware, TACAS, 2011.
DOI : 10.1145/1785414.1785443

URL : https://hal.archives-ouvertes.fr/hal-01100851

J. Alglave, L. Maranget, and M. Tautschnig, Herding cats: modelling , simulation, testing, and data-mining for weak memory, ACM SIG- PLAN Conference on Programming Language Design and Implementation, PLDI '14 Adve. Foundations of the C++ concurrency memory model. In PLDI, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01081413

M. Zoranbudimlì-c, V. Burke, K. Cavé, G. Knobe, and . Lowney, Concurrent collections. Sci. Program, vol.18, pp.203-217, 2010.

[. Boehm and B. Demsky, Outlawing ghosts, Proceedings of the workshop on Memory Systems Performance and Correctness, MSPC '14, pp.1-7, 2014.
DOI : 10.1145/2618128.2618134

[. Batty, M. Dodds, and A. Gotsman, Library abstraction for C/C++ concurrency, Proceedings of the 40th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, POPL '13, pp.235-248, 2013.

A. Bouajjani, E. Derevenetc, and R. Meyer, Checking and Enforcing Robustness against TSO, Proceedings of the 22Nd European Conference on Programming Languages and Systems, ESOP'13, pp.533-553, 2013.
DOI : 10.1007/978-3-642-37036-6_29

M. Batty, A. F. Donaldson, and J. Wickerson, Overhauling SC atomics in C11 and opencl, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp.634-648, 2016.

P. Becker, Standard for Programming Language C++ -ISO/IEC 14882, 2011.

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, 1999.

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell, Clarifying and compiling C/C++ concurrency: from C++11 to POWER, POPL, 2012.

M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber, Mathematizing C++ concurrency, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, pp.55-66, 2011.

G. Boudol and G. Petri, Relaxed memory models: an operational approach, Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.392-403, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00420352

M. [. Burgess and . Saidi, The automatic generation of test cases for optimizing Fortran compilers, Information and Software Technology, vol.38, issue.2, pp.111-119, 1996.
DOI : 10.1016/0950-5849(95)01055-6

S. Abdulazeez, K. Boujarwah, and . Saleh, Compiler test case generation methods: a survey and assessment, Information & Software Technology, vol.39, issue.9, pp.617-625, 1997.

, C++11 mappings to processors

J. J. Choi, L. S. Dongarra, A. P. Ostrouchov, D. W. Petitet, R. C. Walker et al., Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming, vol.5, issue.3, 1996.
DOI : 10.1155/1996/483083

[. Chase and Y. Lev, Dynamic circular work-stealing deque, Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures , SPAA'05, pp.21-28, 2005.
DOI : 10.1145/1073970.1073974

P. Caspi, D. Pilaud, N. Halbwachs, and J. Plaice, Lustre: A declarative language for programming synchronous systems, Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programming Languages, pp.178-188, 1987.

[. Chakraborty and V. Vafeiadis, Validating optimizations of concurrent C/C++ programs [CV17] Soham Chakraborty and Viktor Vafeiadis. Formalizing the concurrency semantics of an LLVM fragment, Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO) International Symposium on Code Generation and Optimization (CGO), pp.216-226, 2016.

D. Dechev, P. Pirkelbauer, and B. Stroustrup, Understanding and Effectively Preventing the ABA Problem in Descriptor-Based Lock-Free Designs, 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, pp.185-192, 2010.
DOI : 10.1109/ISORC.2010.10

[. Elhorst, Lowering c11 atomics for arm in llvm, 2014.

E. Eide and J. Regehr, Volatiles are miscompiled, and what to do about it, Proceedings of the 7th ACM international conference on Embedded software, EMSOFT '08, 2008.
DOI : 10.1145/1450058.1450093

R. Lester, . Ford, R. Delbert, and . Fulkerson, Maximal flow through a network, Canadian journal of Mathematics, vol.8, issue.3, pp.399-404, 1956.

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, PLDI'98

J. Giacomoni, T. Moseley, and M. Vachharajani, FastForward for efficient pipeline parallelism, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.43-52, 2008.
DOI : 10.1145/1345206.1345215

V. Andrew, R. E. Goldberg, and . Tarjan, A new approach to the maximum-flow problem, J. ACM, vol.35, issue.4, pp.921-940, 1988.

I. Michael, W. Gordon, S. Thies, and . Amarasinghe, Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs

H. Ho, Partial redundancy elimination driven by a cost-benefit analysis [ita02] Intel R itanium R architecture software developer's manual, Computer Systems and Software Engineering Proceedings of the Eighth Israeli Conference on, pp.111-118, 1997.

G. Kahn, The semantics of simple language for parallel programming, IFIP Congress, pp.471-475, 1974.

C. Kang, O. Hur, and . Lahav, Viktor Vafeiadis, and Derek Dreyer. A promising semantics for relaxed-memory concurrency, Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, pp.175-189, 2017.

[. Kudlur and S. Mahlke, Orchestrating the Execution of Stream Programs on Multicore Platforms

M. Michael-kuperstein, E. Vechev, and . Yahav, Automatic inference of memory fences, Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design, FMCAD '10, pp.111-120, 2010.
DOI : 10.1145/2261417.2261438

C. Lattner and V. S. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.20-24, 2004.
DOI : 10.1109/CGO.2004.1281665

]. L. Lam77 and . Lamport, Proving the correctness of multiprocess programs, IEEE Trans. on Software Engineering, SE, vol.3, issue.2, pp.125-143, 1977.

L. Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, vol.28, issue.9, pp.690-691, 1979.
DOI : 10.1109/TC.1979.1675439

P. C. Patrick, T. Lee, G. Bu, and . Chandranmenon, A lock-free, cacheefficient shared ring buffer for multi-core architectures, ANCS'09

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin: building customized program analysis tools with dynamic instrumentation, PLDI, 2005.

[. Lê, Kahn process networks as concurrent data structures: lock freedom, parallelism, relaxation in shared memory, 2016.

[. Lê, A. Guatto, A. Cohen, and A. Pop, Correct and Efficient Bounded FIFO Queues, 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp.144-151, 2013.
DOI : 10.1109/SBAC-PAD.2013.8

[. Lindig, Random testing of C calling conventions, Proceedings of the Sixth sixth international symposium on Automated analysis-driven debugging , AADEBUG'05, 2005.
DOI : 10.1145/1085130.1085132

D. [. Lee and . Messerschmitt, Synchronous data flow, Proc. of the IEEE, pp.1235-1245, 1987.
DOI : 10.1109/PROC.1987.13876

N. Liu, N. Nedev, M. Prisadnikov, E. Vechev, and . Yahav, Dynamic synthesis for relaxed memory models, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pp.429-440, 2012.
DOI : 10.1145/2254064.2254115

URL : http://www.cs.technion.ac.il/~yahave/papers/pldi12-fender.pdf

[. Lê, A. Pop, A. Cohen, and F. Z. Nardelli, Correct and efficient work-stealing for weak memory models, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.69-80, 2013.

O. Lahav and V. Vafeiadis, Owicki-Gries Reasoning for Weak Memory Models, Proceedings, Part II, of the 42Nd International Colloquium on Automata, Languages, and Programming, pp.311-323, 2015.
DOI : 10.1007/978-3-662-47666-6_25

E. Paul, J. Mckenney, L. Alglave, A. Maranget, A. Parri et al., Linux-kernel memory ordering: Help arrives at last!, 2016.

M. William and . Mckeeman, Differential testing for software, Digital Technical Journal, vol.10, issue.1, pp.100-107, 1998.

P. Mckenney, P0190r0: Proposal for new memory_order_consume definition, 2016.

[. Min and Y. I. Eom, DANBI: Dynamic Scheduling of Irregular Stream Programs for Many-core Systems, PACT'13
DOI : 10.1109/tpds.2014.2325833

S. Mador-haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave et al., An Axiomatic Memory Model for POWER Multiprocessors, Computer Aided Verification -24th International Conference Proceedings, pp.495-512, 2012.
DOI : 10.1007/978-3-642-31424-7_36

URL : https://hal.archives-ouvertes.fr/hal-01100773

J. Manson, W. Pugh, and S. V. Adve, The java memory model, Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '05, pp.378-391, 2005.
DOI : 10.1145/1047659.1040336

[. Morisset, P. Pawan, and F. Z. Nardelli, Compiler testing via a theory of sound optimisations in the C11/C++11 memory model, ACM SIGPLAN Conference on Programming Language Design and Implementation , PLDI '13, pp.187-196, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00909083

R. [. Mckenney and . Silvera, , 2011.

R. Morisset and F. Z. Nardelli, Partially redundant fence elimination for x86, ARM, and power processors, Proceedings of the 26th International Conference on Compiler Construction, CC 2017, 2017.
DOI : 10.1007/11688839_13

URL : https://hal.archives-ouvertes.fr/hal-01423612

B. Norris and B. Demsky, Cdschecker: Checking concurrent data structures written with c/c++ atomics, Proceedings of the 2013 ACM SIG- PLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '13, pp.131-150, 2013.

R. Namyst and J. Méhaut, PM2: Parallel Multithreaded Machine. A Computing Environment for Distributed Architectures, PARCO, 1995.

S. Owens, S. Sarkar, and P. Sewell, A Better x86 Memory Model: x86-TSO, Theorem Proving in Higher Order Logics, 22nd International Conference Proceedings, pp.391-407, 2009.
DOI : 10.1109/IPDPS.2004.1302944

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, The International Journal of High Performance Computing Applications, vol.17, issue.1, 2009.
DOI : 10.1109/5.476078

[. Pop and A. Cohen, Openstream: Expressiveness and dataflow compilation of openmp streaming programs, ACM Trans. Archit. Code Optim, vol.953, issue.4, pp.1-5325, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00786675

J. Pichon-pharabod and P. Sewell, A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.622-633, 2016.

[. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison et al., Test-case reduction for C compiler bugs, PLDI, 2012.

[. Sevcík and D. Aspinall, On validity of program transformations in the java memory model, ECOOP 2008 -Object-Oriented Programming, 22nd European Conference Proceedings, pp.27-51, 2008.

J. Sevcik, Program Transformations in Weak Memory Models, 2008.

J. Sevcik, Safe optimisations for shared-memory concurrent programs, PLDI, 2011.

Z. Sura, X. Fang, C. Wong, S. P. Midkiff, J. Lee et al., Compiler techniques for high performance sequentially consistent java programs, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '05, pp.2-13, 2005.
DOI : 10.1145/1065944.1065947

[. Sheridan, Practical testing of a C99 compiler using output comparison. Software: Practice and Experience, pp.1475-1488, 2007.
DOI : 10.1002/spe.812

[. Scholz, R. Horspool, and J. Knoop, Optimizing for space and time usage with speculative partial redundancy elimination, Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pp.221-230, 2004.
DOI : 10.1145/998300.997195

K. Memarian, S. Owens, M. Batty, P. Sewell, L. Maranget et al., ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pp.311-322, 2012.

R. L. Solder, A general test data generator for COBOL, AFIPS Joint Computer Conferences, 1962.

D. Shasha and M. Snir, Efficient and correct execution of parallel programs that share memory, ACM Transactions on Programming Languages and Systems, vol.10, issue.2, pp.282-312, 1988.
DOI : 10.1145/42190.42277

D. Shasha and M. Snir, Efficient and correct execution of parallel programs that share memory, ACM Transactions on Programming Languages and Systems, vol.10, issue.2, 1988.
DOI : 10.1145/42190.42277

P. Sarkar, J. Sewell, L. Alglave, D. Maranget, and . Williams, Understanding POWER multiprocessors, Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp.175-186, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01100824

P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen, x86-TSO, Communications of the ACM, vol.53, issue.7, pp.89-97, 2010.
DOI : 10.1145/1785414.1785443

[. Suettlerlein, S. Zuckerman, and G. R. Gao, An Implementation of the Codelet Model, Euro-Par, pp.633-644, 2013.
DOI : 10.1007/978-3-642-40047-6_63

W. Thies and S. P. Amarasinghe, An empirical characterization of stream programs and its implications for language and compiler design, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10
DOI : 10.1145/1854273.1854319

]. A. Ter08 and . Terekhov, Brief tentative example x86 implementation for C/C++ memory model, 1933.

W. Thies, M. Karczmarek, and S. P. Amarasinghe, StreamIt: A Language for Streaming Applications, CC'02
DOI : 10.1007/3-540-45937-5_14

A. Turon, V. Vafeiadis, and D. Dreyer, GPS: navigating weak memory with ghosts, protocols, and separation, Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, pp.691-707, 2014.

. Vbc-+-15-]-viktor, T. Vafeiadis, S. Balabonski, R. Chakraborty, F. Z. Morisset et al., Common compiler optimisations are invalid in the C11 memory model and what we can do about it, Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.209-220, 2015.

[. Vandierendonck, K. Chronaki, and D. S. Nikolopoulos, Deterministic scale-free pipeline parallelism with hyperqueues, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13
DOI : 10.1145/2503210.2503233

V. Vafeiadis and C. Narayan, Relaxed separation logic: a program logic for C11 concurrency, Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, part of SPLASH 2013, pp.867-884, 2013.

[. Vafeiadis and F. Z. Nardelli, Verifying Fence Elimination Optimisations, Static Analysis -18th International Symposium, SAS 2011 Proceedings, pp.146-162, 2011.
DOI : 10.1145/1065944.1065947

L. David, T. Weaver, and . Germond, The sparc architecture manual, 2003.

[. Xue and J. Knoop, A fresh look at PRE as a maximum flow problem Held as Part of the Joint European Conferences on Theory and Practice of Software, Compiler Construction, 15th International Conference Proceedings, pp.139-154, 2006.

[. Yang, Y. Chen, E. Eide, and J. Regehr, Finding and understanding bugs in C compilers, PLDI, 2011.
DOI : 10.1145/2345156.1993532

URL : http://www.stanford.edu/class/cs343/resources/finding-bugs-compilers.pdf

C. Zhao, Y. Xue, Q. Tao, L. Guo, and Z. Wang, Automated test program generation for an industrial optimizing compiler, AST, 2009.