J. R. Allen and K. Kennedy, PFC: A program to convert Fortran to parallel form, 1982.

R. Allen and K. Kennedy, Automatic translation of FORTRAN programs to vector form, ACM Transactions on Programming Languages and Systems, vol.9, issue.4, pp.491-542, 1987.
DOI : 10.1145/29873.29875

M. S. Alnaes, A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells, Unified form language, ACM Transactions on Mathematical Software, vol.40, issue.2, p.9, 2014.
DOI : 10.1145/2566630

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands et al., The landscape of parallel computing research: A view from Berkeley, 2006.

R. Baghdadi, A. Cohen, C. Bastoul, L. Pouchet, and L. Rauchwerger, The potential of synergistic static, dynamic and speculative loop nest optimizations for automatic parallelization, Workshop on Parallel Execution of Sequential Programs on Multicore Architectures (PESPMA'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00494305

R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse et al., PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming, 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015.
DOI : 10.1109/PACT.2015.17

URL : https://hal.archives-ouvertes.fr/hal-01257236

R. Baghdadi, A. Cohen, T. Grosser, S. Verdoolaege, A. Lokhmotov et al., PENCIL Language Specification, URL, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01154812

U. Banerjee, Data dependence in ordinary programs, 1976.

U. K. Banerjee, Dependence Analysis for Supercomputing, 1988.
DOI : 10.1007/978-1-4684-6894-6

M. M. Baskaran, J. Ramanujam, and P. Sadayappan, Automatic C-to-CUDA Code Generation for Affine Programs, Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, pp.244-263, 2010.
DOI : 10.1007/978-3-642-11970-5_14

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., pp.7-16, 2004.
DOI : 10.1109/PACT.2004.1342537

URL : https://hal.archives-ouvertes.fr/hal-00017260

C. Bastoul and P. Feautrier, Improving Data Locality by Chunking, Proceedings of the 12th International Conference on Compiler Construction, CC'03, pp.320-334, 2003.
DOI : 10.1007/3-540-36579-6_23

URL : https://hal.archives-ouvertes.fr/inria-00001055

U. Beaugnon, A. Kravets, S. Van-haastregt, R. Baghdadi, D. Tweed et al., VOBLA: A vehicle for optimized basic linear algebra, LCTES, pp.115-124, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01508181

M. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model Is More Widely Applicable Than You Think, Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, pp.283-303, 2010.
DOI : 10.1007/978-3-642-11970-5_16

URL : https://hal.archives-ouvertes.fr/inria-00551087

A. Bernstein, Analysis of programs for parallel processing Electronic Computers, IEEE Transactions, issue.5, pp.15757-763, 1966.

W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger et al., Polaris: The next generation in parallelizing compilers, Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, pp.141-154, 1994.

U. Bondhugula, Compiling affine loop nests for distributed-memory parallel architectures, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-12, 2013.
DOI : 10.1145/2503210.2503289

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 2008 ACM SIG- PLAN conference on Programming language design and implementation, pp.101-113, 2008.

U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan, A model for fusion and code motion in an automatic parallelizing compiler, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pp.343-352, 2010.
DOI : 10.1145/1854273.1854317

F. Bouchez, A. Darte, C. Guillon, and F. Rastello, Register Allocation: What Does the NP-Completeness Proof of Chaitin et al. Really Prove? Or Revisiting Register Allocation: Why and How, LCPC'06, 2006.
DOI : 10.1007/978-3-540-72521-3_21

D. Callahan, S. Carr, and K. Kennedy, Improving register allocation for subscripted variables, Symp. on Programming Language Design and Implementation (PLDI'90), 1990.

H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya et al., A domain-specific approach to heterogeneous parallelism, PPoPP, pp.35-46, 2011.

G. J. Chaitin, M. A. Auslander, A. K. Cocke, M. E. Hopkins, and P. W. Markstein, Register allocation via coloring, Computer Languages, vol.6, issue.1, pp.47-57, 1981.
DOI : 10.1016/0096-0551(81)90048-5

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797

C. Chiw, G. Kindlmann, J. Reppy, L. Samuels, and N. Seltzer, Diderot: A parallel DSL for image analysis and visualization, PLDI, pp.111-120, 2012.

D. Clmath and . Team, OpenCL math library, 2013. URL https

A. Cohen and E. Rohou, Processor virtualization and split compilation for heterogeneous multicore embedded systems, Proceedings of the 47th Design Automation Conference on, DAC '10, 2010.
DOI : 10.1145/1837274.1837303

URL : https://hal.archives-ouvertes.fr/inria-00472274

C. Inc, Cray standard c/c++ reference manual, 2012.

A. Danalis, G. Marin, C. Mccurdy, J. S. Meredith, P. C. Roth et al., The Scalable Heterogeneous Computing (SHOC) benchmark suite, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pp.63-74, 2010.
DOI : 10.1145/1735688.1735702

A. Darte and G. Huard, New Complexity Results on Array Contraction and Related Problems, Journal of VLSI signal processing systems for signal, image and video technology, vol.24, issue.3/4, pp.35-55, 2005.
DOI : 10.1007/s11265-005-4937-3

A. Darte, Y. Robert, and F. Vivien, Scheduling and Automatic Parallelization, 2000.
DOI : 10.1007/978-1-4612-1362-8

URL : https://hal.archives-ouvertes.fr/hal-00856645

R. Eigenmann, J. Hoeflinger, Z. Li, and D. Padua, Experience in the automatic parallelization of four Perfect-Benchmark programs, 1992.
DOI : 10.1007/BFb0038658

P. Feautrier, Array expansion, Proceedings of the 2nd international conference on Supercomputing, pp.429-441, 1988.
DOI : 10.1145/2591635.2667159

URL : https://hal.archives-ouvertes.fr/hal-01099746

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991.
DOI : 10.1007/BF01407931

P. Feautrier, Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-347, 1992.
DOI : 10.1007/BF01407835

P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.2, issue.4, pp.389-420, 1992.
DOI : 10.1007/BF01379404

P. Feautrier, Scalable and Structured Scheduling, International Journal of Parallel Programming, vol.28, issue.6, pp.459-487, 2006.
DOI : 10.1007/s10766-006-0011-4

M. Griebl and J. Collard, Generation of synchronous code for automatic parallelization of while loops, EURO-PAR'95 Parallel Processing, pp.313-326, 1995.
DOI : 10.1007/BFb0020474

T. Grosser, A. Groesslinger, and C. Lengauer, Polly ? performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters

T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege, Hybrid Hexagonal/Classical Tiling for GPUs, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.66-66, 2014.
DOI : 10.1145/2581122.2544160

URL : https://hal.archives-ouvertes.fr/hal-00911177

T. Grosser, S. Verdoolaege, and A. Cohen, Polyhedral AST Generation Is More Than Scanning Polyhedra, ACM Transactions on Programming Languages and Systems, vol.37, issue.4, 2015.
DOI : 10.1145/2743016

URL : https://hal.archives-ouvertes.fr/hal-01257239

M. Gupta, On privatization of variables for data-parallel execution, Proceedings 11th International Parallel Processing Symposium, pp.533-541, 1997.
DOI : 10.1109/IPPS.1997.580952

S. Hack, D. Grund, and G. Goos, Register Allocation for Programs in SSA-Form, CC'06, pp.247-262, 2006.
DOI : 10.1007/11688839_20

J. L. Henning, SPEC CPU2000: measuring CPU performance in the New Millennium, Computer, vol.33, issue.7, pp.28-35, 2000.
DOI : 10.1109/2.869367

J. Holewinski, L. Pouchet, and P. Sadayappan, High-performance code generation for stencil computations on GPU architectures, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.311-320
DOI : 10.1145/2304576.2304619

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-328, 1988.
DOI : 10.1145/73560.73588

A. Jimborean, P. Clauss, J. Dollinger, V. Loechner, and J. M. Caamaño, Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons, International Journal of Parallel Programming, vol.30, issue.3, pp.529-545, 2014.
DOI : 10.1007/s10766-013-0259-4

URL : https://hal.archives-ouvertes.fr/hal-00825738

W. Kelly, W. Pugh, and E. Rosser, Code generation for multiple mappings, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation, p.332, 1995.
DOI : 10.1109/FMPC.1995.380437

K. Kennedy and J. R. Allen, Optimizing compilers for modern architectures: a dependence-based approach, 2002.

K. Group, Opencl 1.2 specification, 2011.

J. Knoop, O. Rüthing, and B. Steffen, Optimal code motion: theory and practice, ACM Transactions on Programming Languages and Systems, vol.16, issue.4, pp.1117-1155, 1994.
DOI : 10.1145/183432.183443

W. Landi, Undecidability of static analysis, ACM Letters on Programming Languages and Systems, vol.1, issue.4, pp.323-337, 1992.
DOI : 10.1145/161494.161501

C. Lattner and V. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-88, 2004.
DOI : 10.1109/CGO.2004.1281665

H. Lee and M. Aaftab, The OpenCL specification 2.0, 2015.

V. Lefebvre and P. Feautrier, Automatic storage management for parallel programs, Parallel Computing, vol.24, issue.3-4, pp.649-671, 1998.
DOI : 10.1016/S0167-8191(98)00029-5

E. Lenormand and G. Edelin, An industrial perspective: A pragmatic high end signal processing design environment at Thales, SAMOS, pp.52-57, 2003.

Z. Li, Array privatization for parallel execution of loops, Proceedings of the 6th international conference on Supercomputing, pp.313-322, 1992.

A. W. Lim and M. S. Lam, Maximizing parallelism and minimizing synchronization with affine partitions, Parallel Computing, vol.24, issue.3-4, pp.3-4445, 1998.
DOI : 10.1016/S0167-8191(98)00021-0

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.7731

A. W. Lim, G. I. Cheong, and M. S. Lam, An affine partitioning algorithm to maximize parallelism and minimize communication, Proceedings of the 13th international conference on Supercomputing , ICS '99, pp.228-237, 1999.
DOI : 10.1145/305138.305197

D. B. Loveman, High performance Fortran. Parallel & Distributed Technology: Systems & Applications, pp.25-42, 1993.

M. Luján, T. L. Freeman, and J. R. Gurd, Oolala: An object oriented analysis and design of numerical linear algebra, OOPSLA, pp.229-252, 2000.

D. E. Maydan, S. P. Amarasinghe, and M. S. Lam, Array-data flow analysis and its use in array privatization, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '93, pp.2-15, 1993.
DOI : 10.1145/158511.158515

S. Mehta and P. Yew, Improving compiler scalability: optimizing large programs at small price, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.143-152, 2015.
DOI : 10.1145/2813885.2737954

B. Meister, N. Vasilache, D. Wohlford, M. M. Baskaran, A. Leung et al., Rstream compiler, Encyclopedia of Parallel Computing, pp.1756-1765, 2011.

S. Midkiff, Automatic Parallelization: An Overview of Fundamental Compiler Techniques, Synthesis Lectures on Computer Architecture, vol.7, issue.1, 2012.
DOI : 10.2200/S00340ED1V01Y201201CAC019

A. Monsifrot, F. Bodin, and R. Quiniou, A Machine Learning Approach to Automatic Production of Compiler Heuristics, Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, AIMSA '02, pp.41-50, 2002.
DOI : 10.1007/3-540-46148-5_5

S. S. Muchnick, Advanced compiler design and implementation, 1997.

R. T. Mullapudi, V. Vasista, and U. Bondhugula, Polymage: Automatic optimization for image processing pipelines, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pp.429-443

. Nvidia, Nvidia CUDA programming guide 4, 2011.

D. C. Oppen, A 222pn upper bound on the complexity of Presburger Arithmetic, Journal of Computer and System Sciences, vol.16, issue.3, pp.323-332, 1978.
DOI : 10.1016/0022-0000(78)90021-1

S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Silber et al., GRAPHITE: polyhedral analyses and optimizations for GCC, proceedings of the 2006 GCC developers summit, p.2006, 2006.

W. Pugh, The Omega test: a fast and practical integer programming algorithm for dependence analysis, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.4-13, 1991.
DOI : 10.1145/125826.125848

W. Pugh, The Omega test: a fast and practical integer programming algorithm for dependence analysis, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.4-13, 1991.
DOI : 10.1145/125826.125848

F. Quilleré and S. Rajopadhye, Optimizing memory usage in the polyhedral model, ACM Trans. on Programming Languages and Systems, vol.22, issue.5, pp.773-815, 2000.

J. Ragan-kelley, C. Barnes, A. Adams, S. Paris, F. Durand et al., Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, PLDI, pp.519-530, 2013.

G. Ramalingam, The undecidability of aliasing, ACM Transactions on Programming Languages and Systems, vol.16, issue.5, pp.1467-1471, 1994.
DOI : 10.1145/186025.186041

J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for nonshared memory machines, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.111-120, 1991.
DOI : 10.1145/125826.125893

T. Rompf and M. Odersky, Lightweight modular staging: A pragmatic approach to runtime code generation and compiled dsls, Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE '10, pp.127-136, 2010.

R. Stansifer, Presburger's article on integer airthmetic: Remarks and translation, pp.84-639, 1984.

M. Stephenson, S. Amarasinghe, M. Martin, and U. Reilly, Meta optimization, ACM SIGPLAN Notices, vol.38, issue.5, pp.77-90, 2003.
DOI : 10.1145/780822.781141

M. W. Stephenson, Automating the construction of compiler heuristics using machine learning, 2006.

J. E. Stone, D. Gohara, and G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, pp.66-73, 2010.
DOI : 10.1109/MCSE.2010.69

A. Sukumaran-rajam, J. M. Caamano, W. Wolff, A. Jimborean, and P. Clauss, Speculative Program Parallelization with Scalable and Decentralized Runtime Verification, Runtime Verification, pp.124-139, 2014.
DOI : 10.1007/978-3-319-11164-3_11

URL : https://hal.archives-ouvertes.fr/hal-01070610

W. Thies, F. Vivien, J. Sheldon, and S. Amarasinghe, A unified framework for schedule and storage optimization, Proc. of the 2001 PLDI Conf, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00808285

K. Trifunovic, A. Cohen, D. Edelsohn, F. Li, T. Grosser et al., GRAPHITE two years after: First lessons learned from Real-World polyhedral compilation, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551516

K. Trifunovic, A. Cohen, R. Ladelski, and F. Li, Elimination of Memory-Based dependences for Loop-Nest optimization and parallelization, 3rd GCC Research Opportunities Workshop, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00992740

P. Tu and D. Padua, Automatic array privatization, Languages and Compilers for Parallel Computing, pp.500-521, 1994.
DOI : 10.1007/3-540-57659-2_29

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.5746

U. Banerjee, Data dependence in ordinary programs, 1976.

N. Vasilache, Scalable Program Optimization Techniques in the Polyhedral Model, 2007.

N. Vasilache, C. Bastoul, A. Cohen, and S. Girbal, Violated dependence analysis, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.335-344, 2006.
DOI : 10.1145/1183401.1183448

URL : https://hal.archives-ouvertes.fr/hal-01257290

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, Mathematical Software - ICMS 2010, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

S. Verdoolaege, Pencil support in pet and PPCG, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01133962

S. Verdoolaege and T. Grosser, Polyhedral extraction tool, IMPACT, 2012.

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, 2013.
DOI : 10.1145/2400682.2400713

URL : https://hal.archives-ouvertes.fr/hal-00786677

W. and V. Hagen, The definitive guide to GCC, 2006.
DOI : 10.1007/978-1-4302-0219-6

N. Wirth, Extended Backus-Naur Form Syntaxt Specification, 1996.

M. E. Wolf and M. S. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Transactions on Parallel and Distributed Systems, vol.2, issue.4, pp.452-471, 1991.
DOI : 10.1109/71.97902

M. Wolfe, Iteration space tiling for memory hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing Society for Industrial and Applied Mathematics, pp.357-361, 1989.

D. Wonnacott, Omega Calculator, Encyclopedia of Parallel Computing, pp.978-978, 2011.
DOI : 10.1007/978-0-387-09766-4_2303

@. R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse et al., PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming, 2015 International Conference on Parallel Architecture and Compilation (PACT)
DOI : 10.1109/PACT.2015.17

URL : https://hal.archives-ouvertes.fr/hal-01257236

@. R. Baghdadi, A. Cohen, S. Verdoolaege, T. Grosser, J. Absar et al., PENCIL Language Specification, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01154812

@. U. Beaugnon, A. Kravets, S. V. Haastregt, R. Baghdadi, D. Tweed et al., VOBLA: A Vehicle for Optimized Basic Linear Algebra, p.14, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01508181

@. R. Baghdadi, A. Cohen, S. Verdoolaege, and K. Trifunovic, Improved loop tiling based on the removal of spurious false dependences, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.2013
DOI : 10.1145/2400682.2400711

URL : https://hal.archives-ouvertes.fr/hal-00786674