A. Acharya, U. Bondhugula, and A. Cohen, Polyhedral autotransformation with no integer linear programming, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pp.529-542, 2018.

C. Ancourt and F. Irigoin, Scanning polyhedra with do loops, Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '91, pp.39-50, 1991.
URL : https://hal.archives-ouvertes.fr/hal-00752774

S. Akv-+-14]-jason-ansel, K. Kamil, J. Veeramachaneni, J. Ragan-kelley, U. Bosboom et al., Opentuner: An extensible framework for program autotuning, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.303-316, 2014.

M. Aubry, S. Paris, S. W. Hasinoff, J. Kautz, and F. Durand, Fast local laplacian filters: Theory and applications, ACM Trans. Graph, vol.33, issue.5, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01063419

J. Peter, E. H. Burt, and . Adelson, A multiresolution spline with application to image mosaics, ACM Trans. Graph, vol.2, issue.4, pp.217-236, 1983.

S. Balay, . Abhyankar, . Adams, . Brown, . Brune et al., Petsc users manual revision 3.5. Argonne National Laboratory, 2014.

U. Bondhugula, A. Acharya, and A. Cohen, The pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests, ACM Trans. Program. Lang. Syst, vol.38, issue.3, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01425546

U. Banerjee, Unimodular Matrices, pp.21-48, 1993.

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pp.7-16, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00017260

G. Somashekaracharya, U. Bhaskaracharya, and . Bondhugula, Polyglot: A polyhedral loop transformation framework for a graphical dataflow language, Proceedings of the 22Nd International Conference on Compiler Construction, CC'13, pp.123-143, 2013.

U. Bondhugula, V. Bandishti, and A. Cohen, Tiling and optimizing time-iterated computations on periodic domains, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.39-50, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01257240

R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse et al., Pencil: A platform-neutral compute intermediate language for accelerator programming, Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), PACT '15, pp.138-149, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257236

G. Somashekaracharya, U. Bhaskaracharya, A. Bondhugula, and . Cohen, Smo: An integrated approach to intra-array and inter-array storage optimization, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '16, pp.526-538, 2016.

U. Bondhugula, V. Bandishti, and I. Pananilath, Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.5, pp.1285-1298, 2017.

R. Baghdadi, A. Cohen, S. Verdoolaege, and K. Trifunovi?, Improved loop tiling based on the removal of spurious false dependences, ACM Trans. Archit. Code Optim, vol.9, issue.4, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00786674

J. Bezanson, A. Edelman, S. Karpinski, and V. Shah, Julia: A fresh approach to numerical computing, SIAM Review, vol.59, issue.1, pp.65-98, 2017.

C. Bastoul and P. Feautrier, Improving data locality by chunking, Proceedings of the 12th International Conference on Compiler Construction, CC'03, pp.320-334, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00001055

N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, vol.18, pp.1-18, 2009.

A. Buluc and J. Gilbert, The combinatorial blas: Design, implementation, and applications, Int. J. High Perform. Comput. Appl, vol.25, issue.4, pp.496-509, 2011.

U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan, A model for fusion and code motion in an automatic parallelizing compiler, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pp.343-352, 2010.

D. F. Bacon, S. L. Graham, and O. J. Sharp, Compiler transformations for high-performance computing, ACM Comput. Surv, vol.26, issue.4, pp.345-420, 1994.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008.

R. Bagnara, P. M. Hill, and E. Zaffanella, The parma polyhedra library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems, Sci. Comput. Program, vol.72, issue.1-2, pp.3-21, 2008.

W. Bao, S. Krishnamoorthy, L. Pouchet, F. Rastello, and P. Sadayappan, Polycheck: Dynamic verification of iteration space transformations on affine programs, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '16, pp.539-554, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01234104

U. Bondhugula, Effective Automatic Parallelization and Locality Optimization Using the Polyhedral Model, 2008.

U. Bondhugula, Compiling affine loop nests for distributed-memory parallel architectures, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, vol.33, pp.1-33, 2013.

V. Bandishti, I. Pananilath, and U. Bondhugula, Tiling stencil computations to maximize parallelism, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, vol.40, 2012.

M. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul,

, Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pp.283-303, 2010.

R. Baghdadi, J. Ray, E. D. Malek-ben-romdhane, P. Sozzo, S. Suriana et al., Tiramisu: A code optimization framework for high performance systems, CoRR, 2018.

U. Bondhugula, J. Ramanujam, and P. Sadayappan, Automatic mapping of nested loops to fpgas, Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '07, pp.101-111, 2007.

J. Collard, D. Barthou, and P. Feautrier, Fuzzy array dataflow analysis, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.92-101, 1995.

C. Chen, J. Chame, and M. Hall, Chill: A framework for composing high-level loop transformations, 2008.

C. Chen, Polyhedra scanning revisited, Proceedings of the 33rd ACM SIG-PLAN Conference on Programming Language Design and Implementation, PLDI '12

, Clan: A polyhedral representation extraction tool for c-based high level languages, pp.2018-2026

T. Chen, T. Moreau, Z. Jiang, H. Shen, E. Q. Yan et al., TVM: end-to-end optimization stack for deep learning, 2018.

J. ,

. Collard, Space-time transformation of while-loops using speculative execution, Proceedings of IEEE Scalable High Performance Computing Conference, pp.429-436, 1994.

J. Collard, Automatic parallelization of while-loops using speculative execution, Int. J. Parallel Program, vol.23, issue.2, pp.191-219, 1995.

J. Chen, S. Paris, and F. Durand, Real-time edge-aware image processing with the bilateral grid, ACM SIGGRAPH 2007 Papers, SIGGRAPH '07, 2007.

A. Cohen, M. Sigler, S. Girbal, O. Temam, D. Parello et al., Facilitating the search for compositions of program transformations, Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pp.151-160, 2005.
URL : https://hal.archives-ouvertes.fr/hal-01257296

T. A. Davis and Y. Hu, The university of florida sparse matrix collection, ACM Trans. Math. Softw, vol.38, issue.1, 2011.

E. C. Davis, M. M. Strout, and C. Olschanowsky, Transforming loop chains via macro dataflow graphs, Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, pp.265-277, 2018.

A. Darte, R. Schreiber, and G. Villard, Lattice-based memory allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005.
URL : https://hal.archives-ouvertes.fr/hal-02101912

H. Eissfeller and S. Muller, The triangle method for saving startup time in parallel computers, Distributed Memory Computing Conference, pp.568-572, 1990.

P. Feautrier, Parametric integer programming, RAIRO-Operations Research, vol.22, issue.3, pp.243-268, 1988.

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991.

P. Feautrier, Some efficient solutions to the affine scheduling problem. i. onedimensional time, International Journal of Parallel Programming, vol.21, issue.5, pp.313-347, 1992.

P. Feautrier, Some efficient solutions to the affine scheduling problem. part ii. multidimensional time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992.

P. Feautrier and C. Lengauer, Polyhedron Model, pp.1581-1592, 2011.

M. Griebl and J. Collard, Generation of synchronous code for automatic parallelization of while loops, EURO-PAR '95 Parallel Processing, pp.313-326, 1995.

T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege, Hybrid hexagonal/classical tiling for gpus, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.66, pp.66-66, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911177

T. Grosser, A. Cohen, H. J. Paul, J. Kelly, P. Ramanujam et al., Split tiling for gpus: Automatic parallelization using trapezoidal tiles, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pp.24-31, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00786812

M. Griebl, P. Feautrier, and C. Lengauer, Index set splitting, Int. J. Parallel Program, vol.28, issue.6, pp.607-631, 2000.

M. Geigl, M. Griebl, and C. Lengauer, A scheme for detecting the termination of a parallel loop nest, Proc. GI/ITG FG PARS, vol.98, 1998.

M. Geigl, M. Griebl, and C. Lengauer, Termination detection in parallel loop nests with while loops, Parallel Computing, vol.25, issue.12, pp.1489-1510, 1999.

T. Grosser, A. Groesslinger, and C. Lengauer, Polly-performing polyhedral optimizations on a low-level intermediate representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012.

M. Griebl and C. Lengauer, On scanning space-time mapped while loops, Parallel Processing: CONPAR 94 -VAPP VI, pp.677-688, 1994.

T. Grosser, A decoupled approach to high-level loop optimization: tile shapes, polyhedral building blocks and low-level compilers, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01144563

P. Ghysels and W. Vanroose, Modeling the performance of geometric multigrid stencils on multicore computer architectures, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.194-216, 2015.

N. Gvb-+-06]-sylvain-girbal, C. Vasilache, A. Bastoul, D. Cohen, M. Parello et al., Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, Int. J. Parallel Program, vol.34, issue.3, pp.261-317, 2006.

T. Grosser, S. Verdoolaege, and A. Cohen, Polyhedral ast generation is more than scanning polyhedra, ACM Trans. Program. Lang. Syst, vol.37, issue.4, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257239

T. Grosser, S. Verdoolaege, A. Cohen, and P. Sadayappan, The relation between diamond tiling and hexagonal tiling, Parallel Processing Letters, vol.24, issue.03, p.1441002, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01257248

J. Holewinski, L. Pouchet, and P. Sadayappan, High-performance code generation for stencil computations on gpu architectures, Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pp.311-320, 2012.

C. Harris and M. Stephens, A combined corner and edge detector, Alvey vision conference, vol.15, pp.10-5244, 1988.

. Hvf-+-13]-tom, R. Henretty, F. Veras, L. Franchetti, J. Pouchet et al., A stencil compiler for short-vector simd architectures, Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pp.13-24, 2013.

, The ansi c standard (c99), 1999.

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '88, pp.319-329, 1988.

A. Jangda and U. Bondhugula, An effective fusion and tile size model for optimizing image processing pipelines, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '18, pp.261-275, 2018.

A. Jimborean, P. Clauss, J. Dollinger, V. Loechner, and J. Caamaño, Dynamic and speculative polyhedral parallelization using compiler-generated skeletons, Int. J. Parallel Program, vol.42, issue.4, pp.529-545, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00825738

J. C. Juega, J. I. Gomez, C. Tenllado, and F. Catthoor, Adaptive mapping and parameter selection scheme to improve automatic code generation for gpus, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.251, pp.251-251, 2014.

K. Kennedy and J. R. Allen, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, 2002.

M. +-07]-sriram-krishnamoorthy, U. Baskaran, J. Bondhugula, A. Ramanujam, P. Rountev et al., Effective automatic parallelization of stencil computations, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pp.235-244, 2007.

S. Kkc-+-17]-fredrik-kjolstad, S. Kamil, D. Chou, S. Lugato, and . Amarasinghe, The tensor algebra compiler, Proc. ACM Program. Lang, vol.1, 2017.

V. Kmp-+-96]-wayne-kelly, W. Maslov, E. Pugh, T. Rosser, D. Shpeisman et al., The omega calculator and library, version 1.1. 0, p.18, 1996.

R. M. Karp, R. E. Miller, and S. Winograd, The organization of computations for uniform recurrence equations, J. ACM, vol.14, issue.3, pp.563-590, 1967.

W. Kelly and W. Pugh, A unifying framework for iteration reordering transformations, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing, vol.1, pp.153-162, 1995.

M. Kong, A. Pop, L. Pouchet, R. Govindarajan, A. Cohen et al., Compiler/runtime framework for dynamic dataflow parallelization of tiled programs, ACM Trans. Archit. Code Optim, vol.11, issue.4, 2015.

W. Kelly, W. Pugh, and E. Rosser, Code generation for multiple mappings, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp.332-341, 1995.

D. Kim, L. Renganarayanan, and D. Rostron, Sanjay Rajopadhye, and Michelle Mills Strout. Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC '07, vol.51, pp.1-51, 2007.

M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet et al., When polyhedral transformations meet simd code generation, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pp.127-138, 2013.

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola et al., Distributed graphlab: A framework for machine learning and data mining in the cloud, Proc. VLDB Endow, vol.5, issue.8, pp.716-727, 2012.

A. W. Lim and M. S. Lam, Maximizing parallelism and minimizing synchronization with affine transforms, Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '97, pp.201-214, 1997.

B. Meister, R. Lethin, A. Leung, and E. Schweitz, R-stream: A parametric high level compiler, High Performance Embedded Computing Workshop, 2006.

V. Loechner, Polylib: A library for manipulating parameterized polyhedra, 1999.

V. Loechner, K. Doran, and . Wilde, Parameterized polyhedra and their vertices, Int. J. Parallel Program, vol.25, issue.6, pp.525-549, 1997.
URL : https://hal.archives-ouvertes.fr/inria-00534851

E. Dror, . Maydan, P. Saman, M. S. Amarasinghe, and . Lam, Array-data flow analysis and its use in array privatization, Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '93, pp.2-15, 1993.

V. Maslov, Lazy array data-flow dependence analysis, Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '94, pp.311-325, 1994.

A. Ravi-teja-mullapudi, D. Adams, J. Sharlet, K. Ragan-kelley, and . Fatahalian, Automatically scheduling halide image processing pipelines, ACM Trans. Graph, vol.35, issue.4, 2016.

J. Mellor, -. Crummey, and J. Garvin, Optimizing sparse matrix-vector product computations using unroll and jam, Int. J. High Perform. Comput. Appl, vol.18, issue.2, pp.225-236, 2004.

M. Tareq, G. Malas, H. Hager, D. E. Ltaief, and . Keyes, Multidimensional intratile parallelization for memory-starved stencil computations, ACM Trans. Parallel Comput, vol.4, issue.3, 2017.

S. Mehta, P. Lin, and P. Yew, Revisiting loop fusion in the polyhedral framework, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pp.233-246, 2014.

V. Ravi-teja-mullapudi, U. Vasista, and . Bondhugula, Polymage: Automatic optimization for image processing pipelines, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pp.429-443, 2015.

S. Mehta and P. Yew, Improving compiler scalability: Optimizing large programs at small price, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pp.143-152, 2015.

A. Daniel, G. R. Orozco, and . Gao, Mapping the fdtd application to manycore chip architectures, International Conference on Parallel Processing, pp.309-316, 2009.

I. Pananilath, A. Acharya, V. Vasista, and U. Bondhugula, An optimizing code generator for a class of lattice-boltzmann computations, ACM Trans. Archit. Code Optim, vol.12, issue.2, 2015.

S. Paris, S. W. Hasinoff, and J. Kautz, Local laplacian filters: Edgeaware image processing with a laplacian pyramid, Commun. ACM, vol.58, issue.3, pp.81-91, 2015.

S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision, vol.4, pp.1-73, 2009.

W. Pugh, The omega test: A fast and practical integer programming algorithm for dependence analysis, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, pp.4-13, 1991.

W. Pugh and D. Wonnacott, Eliminating false data dependences using the omega test, Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI '92, pp.140-151, 1992.

W. Pugh and D. Wonnacott, An exact method for analysis of value-based array data dependences, Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pp.546-566, 1994.

W. Pugh and D. Wonnacott, Nonlinear array dependence analysis, 1994.

W. Pugh and D. Wonnacott, Static analysis of upper and lower bounds on dependences and parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994.

L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, Polyhedralbased data reuse optimization for configurable computing, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pp.29-38, 2013.

F. Quilleré, S. Rajopadhye, and D. Wilde, Generation of efficient nested loops from polyhedra, International Journal of Parallel Programming, vol.28, issue.5, pp.469-498, 2000.

C. Reddy and U. Bondhugula, Effective automatic computation placement and data allocation for parallelization of regular programs, Proceedings of the 28th ACM International Conference on Supercomputing, ICS '14, pp.13-22, 2014.

M. Ravishankar, R. Dathathri, V. Elango, L. Pouchet, J. Ramanujam et al., Distributed memory code generation for mixed irregular/regular computations, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.65-75, 2015.

J. Ragan-kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe et al., Decoupling algorithms from schedules for easy optimization of image processing pipelines, ACM Trans. Graph, vol.31, issue.4, 2012.

J. Ragan-kelley, C. Barnes, A. Adams, S. Paris, F. Durand et al., Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pp.519-530, 2013.

C. Reddy, M. Kruse, and A. Cohen, Reduction drawing: Language constructs and polyhedral compilation for reductions on gpu, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT '16, pp.87-97, 2016.

L. Rauchwerger and D. Padua, Parallelizing while loops for multiprocessor systems, Proceedings of 9th International Parallel Processing Symposium, pp.347-356, 1995.

M. M. Strout, L. Carter, and J. Ferrante, Compile-time composition of run-time data and iteration reorderings, Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI '03, pp.91-102, 2003.

M. M. Strout, L. Carter, J. Ferrante, and B. Simon, Scheduleindependent storage mapping for loops, Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VIII, pp.24-33, 1998.

S. Shrestha, R. Guang, J. Gao, A. Manzano, J. Marquez et al., Locality aware concurrent start for stencil applications, Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, pp.157-166, 2015.

M. M. Strout, G. Georg, and C. Olschanowsky, Set and relation manipulation for the sparse polyhedral framework, Languages and Compilers for Parallel Computing, pp.61-75, 2013.

J. Shirako, A. Hayashi, and V. Sarkar, Optimized two-level parallelization for gpu accelerators using the polyhedral model, Proceedings of the 26th International Conference on Compiler Construction, pp.22-33, 2017.

M. M. Strout, A. Lamielle, L. Carter, J. Ferrante, B. Kreaseck et al., An approach for code generation in the sparse polyhedral framework, Parallel Comput, vol.53, issue.C, pp.32-57, 2016.

N. Diogo, L. Sampaio, F. Pouchet, and . Rastello, Simplification and runtime resolution of data dependence constraints for loop transformations, Proceedings of the International Conference on Supercomputing, ICS '17, vol.10, pp.1-10, 2017.

A. Sukumaran-rajam and P. Clauss, The polyhedral model of nonlinear loops, ACM Trans. Archit. Code Optim, vol.12, issue.4, p.27, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01244464

R. Strzodka, M. Shaheen, D. Pajak, and H. Seidel, Cache accurate time skewing in iterative stencil computations, Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pp.571-581, 2011.

A. Sbîrlea, J. Shirako, L. Pouchet, and V. Sarkar, Polyhedral optimizations for a data-flow graph language, Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing, vol.9519, pp.57-72, 2016.

K. Trifunovic, A. Cohen, D. Edelsohn, F. Li, T. Grosser et al., GRAPHITE Two Years After: First Lessons Learned From Real-World Polyhedral Compilation, GCC Research Opportunities Workshop (GROW'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551516

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, Polyhedral-model guided loop-nest auto-vectorization, Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT '09, pp.327-337, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00645325

R. Upadrasta and A. Cohen, Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra, Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, pp.483-496, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00911888

N. Vasilache, C. Bastoul, and A. Cohen, Polyhedral code generation in the real world, Compiler Construction, pp.185-201, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00001106

N. Vasilache, C. Bastoul, A. Cohen, and S. Girbal, Violated dependence analysis, Proceedings of the 20th Annual International Conference on Supercomputing, ICS '06, pp.335-344, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01257290

S. Verdoolaege, J. C. Juega, and A. Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. Polyhedral parallel code generation for cuda, ACM Trans. Archit. Code Optim, vol.9, issue.4, 2013.

R. Vuduc, W. James, K. A. Demmel, and . Yelick, Oski: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, vol.16, p.521, 2005.

S. Verdoolaege, Isl: An integer set library for the polyhedral model, Proceedings of the Third International Congress Conference on Mathematical Software, ICMS'10, pp.299-302, 2010.

S. Verdoolaege and T. Grosser, Polyhedral extraction tool, Second Int. Workshop on Polyhedral Compilation Techniques (IMPACT'12), 2012.

A. Venkat, M. Hall, and M. Strout, Loop and data transformations for sparse matrix code, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pp.521-532, 2015.

A. Venkat, J. Mahdi-soltan-mohammadi, H. Park, R. Rong, M. M. Barik et al., Automating wavefront parallelization for sparse matrix computations, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, vol.41, pp.1-41, 2016.

A. Venkat, M. Shantharam, M. Hall, and M. M. Strout, Nonaffine extensions to polyhedral code generation, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.185, pp.185-185, 2014.

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito et al., Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions, 2018.

M. E. Wolf and M. S. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Trans. Parallel Distrib. Syst, vol.2, issue.4, pp.452-471, 1991.

Y. Wang, P. Li, and J. Cong, Theory and algorithm for generalized memory partitioning in high-level synthesis, Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA '14, pp.199-208, 2014.

T. Yuki, G. Gupta, D. Kim, T. Pathan, and S. Rajopadhye, Alphaz: A system for design space exploration in the polyhedral model, Languages and Compilers for Parallel Computing, pp.17-31, 2013.

X. Zhou, J. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni et al., Hierarchical overlapped tiling, Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pp.207-218, 2012.

T. Zhao, X. Huang, and Y. Cao, Deepdsl: A compilation-based domainspecific language for deep learning, 2017.

W. Zuo, P. Li, D. Chen, L. Pouchet, S. Zhong et al., Improving polyhedral code generation for high-level synthesis, Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS '13, vol.15, pp.1-15, 2013.