Polyhedral autotransformation with no integer linear programming, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pp.529-542, 2018. ,
Scanning polyhedra with do loops, Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '91, pp.39-50, 1991. ,
URL : https://hal.archives-ouvertes.fr/hal-00752774
Opentuner: An extensible framework for program autotuning, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.303-316, 2014. ,
Fast local laplacian filters: Theory and applications, ACM Trans. Graph, vol.33, issue.5, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01063419
A multiresolution spline with application to image mosaics, ACM Trans. Graph, vol.2, issue.4, pp.217-236, 1983. ,
Petsc users manual revision 3.5. Argonne National Laboratory, 2014. ,
The pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests, ACM Trans. Program. Lang. Syst, vol.38, issue.3, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01425546
Unimodular Matrices, pp.21-48, 1993. ,
Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
Polyglot: A polyhedral loop transformation framework for a graphical dataflow language, Proceedings of the 22Nd International Conference on Compiler Construction, CC'13, pp.123-143, 2013. ,
Tiling and optimizing time-iterated computations on periodic domains, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.39-50, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01257240
Pencil: A platform-neutral compute intermediate language for accelerator programming, Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), PACT '15, pp.138-149, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01257236
Smo: An integrated approach to intra-array and inter-array storage optimization, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '16, pp.526-538, 2016. ,
Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.5, pp.1285-1298, 2017. ,
Improved loop tiling based on the removal of spurious false dependences, ACM Trans. Archit. Code Optim, vol.9, issue.4, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786674
Julia: A fresh approach to numerical computing, SIAM Review, vol.59, issue.1, pp.65-98, 2017. ,
Improving data locality by chunking, Proceedings of the 12th International Conference on Compiler Construction, CC'03, pp.320-334, 2003. ,
URL : https://hal.archives-ouvertes.fr/inria-00001055
Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, vol.18, pp.1-18, 2009. ,
The combinatorial blas: Design, implementation, and applications, Int. J. High Perform. Comput. Appl, vol.25, issue.4, pp.496-509, 2011. ,
A model for fusion and code motion in an automatic parallelizing compiler, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pp.343-352, 2010. ,
Compiler transformations for high-performance computing, ACM Comput. Surv, vol.26, issue.4, pp.345-420, 1994. ,
A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008. ,
The parma polyhedra library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems, Sci. Comput. Program, vol.72, issue.1-2, pp.3-21, 2008. ,
Polycheck: Dynamic verification of iteration space transformations on affine programs, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '16, pp.539-554, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01234104
Effective Automatic Parallelization and Locality Optimization Using the Polyhedral Model, 2008. ,
Compiling affine loop nests for distributed-memory parallel architectures, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, vol.33, pp.1-33, 2013. ,
Tiling stencil computations to maximize parallelism, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, vol.40, 2012. ,
,
, Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pp.283-303, 2010.
Tiramisu: A code optimization framework for high performance systems, CoRR, 2018. ,
Automatic mapping of nested loops to fpgas, Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '07, pp.101-111, 2007. ,
Fuzzy array dataflow analysis, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.92-101, 1995. ,
Chill: A framework for composing high-level loop transformations, 2008. ,
Polyhedra scanning revisited, Proceedings of the 33rd ACM SIG-PLAN Conference on Programming Language Design and Implementation, PLDI '12 ,
, Clan: A polyhedral representation extraction tool for c-based high level languages, pp.2018-2026
TVM: end-to-end optimization stack for deep learning, 2018. ,
,
Space-time transformation of while-loops using speculative execution, Proceedings of IEEE Scalable High Performance Computing Conference, pp.429-436, 1994. ,
Automatic parallelization of while-loops using speculative execution, Int. J. Parallel Program, vol.23, issue.2, pp.191-219, 1995. ,
Real-time edge-aware image processing with the bilateral grid, ACM SIGGRAPH 2007 Papers, SIGGRAPH '07, 2007. ,
Facilitating the search for compositions of program transformations, Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pp.151-160, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01257296
The university of florida sparse matrix collection, ACM Trans. Math. Softw, vol.38, issue.1, 2011. ,
Transforming loop chains via macro dataflow graphs, Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, pp.265-277, 2018. ,
Lattice-based memory allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-02101912
The triangle method for saving startup time in parallel computers, Distributed Memory Computing Conference, pp.568-572, 1990. ,
Parametric integer programming, RAIRO-Operations Research, vol.22, issue.3, pp.243-268, 1988. ,
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991. ,
Some efficient solutions to the affine scheduling problem. i. onedimensional time, International Journal of Parallel Programming, vol.21, issue.5, pp.313-347, 1992. ,
Some efficient solutions to the affine scheduling problem. part ii. multidimensional time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992. ,
, Polyhedron Model, pp.1581-1592, 2011.
Generation of synchronous code for automatic parallelization of while loops, EURO-PAR '95 Parallel Processing, pp.313-326, 1995. ,
Hybrid hexagonal/classical tiling for gpus, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.66, pp.66-66, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911177
Split tiling for gpus: Automatic parallelization using trapezoidal tiles, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pp.24-31, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786812
Index set splitting, Int. J. Parallel Program, vol.28, issue.6, pp.607-631, 2000. ,
A scheme for detecting the termination of a parallel loop nest, Proc. GI/ITG FG PARS, vol.98, 1998. ,
Termination detection in parallel loop nests with while loops, Parallel Computing, vol.25, issue.12, pp.1489-1510, 1999. ,
Polly-performing polyhedral optimizations on a low-level intermediate representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012. ,
On scanning space-time mapped while loops, Parallel Processing: CONPAR 94 -VAPP VI, pp.677-688, 1994. ,
A decoupled approach to high-level loop optimization: tile shapes, polyhedral building blocks and low-level compilers, 2014. ,
URL : https://hal.archives-ouvertes.fr/tel-01144563
Modeling the performance of geometric multigrid stencils on multicore computer architectures, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.194-216, 2015. ,
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, Int. J. Parallel Program, vol.34, issue.3, pp.261-317, 2006. ,
Polyhedral ast generation is more than scanning polyhedra, ACM Trans. Program. Lang. Syst, vol.37, issue.4, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01257239
The relation between diamond tiling and hexagonal tiling, Parallel Processing Letters, vol.24, issue.03, p.1441002, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01257248
High-performance code generation for stencil computations on gpu architectures, Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pp.311-320, 2012. ,
A combined corner and edge detector, Alvey vision conference, vol.15, pp.10-5244, 1988. ,
A stencil compiler for short-vector simd architectures, Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pp.13-24, 2013. ,
, The ansi c standard (c99), 1999.
Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '88, pp.319-329, 1988. ,
An effective fusion and tile size model for optimizing image processing pipelines, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '18, pp.261-275, 2018. ,
Dynamic and speculative polyhedral parallelization using compiler-generated skeletons, Int. J. Parallel Program, vol.42, issue.4, pp.529-545, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00825738
Adaptive mapping and parameter selection scheme to improve automatic code generation for gpus, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.251, pp.251-251, 2014. ,
Optimizing Compilers for Modern Architectures: A Dependence-based Approach, 2002. ,
Effective automatic parallelization of stencil computations, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pp.235-244, 2007. ,
The tensor algebra compiler, Proc. ACM Program. Lang, vol.1, 2017. ,
The omega calculator and library, version 1.1. 0, p.18, 1996. ,
The organization of computations for uniform recurrence equations, J. ACM, vol.14, issue.3, pp.563-590, 1967. ,
A unifying framework for iteration reordering transformations, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing, vol.1, pp.153-162, 1995. ,
Compiler/runtime framework for dynamic dataflow parallelization of tiled programs, ACM Trans. Archit. Code Optim, vol.11, issue.4, 2015. ,
Code generation for multiple mappings, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp.332-341, 1995. ,
Sanjay Rajopadhye, and Michelle Mills Strout. Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC '07, vol.51, pp.1-51, 2007. ,
When polyhedral transformations meet simd code generation, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pp.127-138, 2013. ,
Distributed graphlab: A framework for machine learning and data mining in the cloud, Proc. VLDB Endow, vol.5, issue.8, pp.716-727, 2012. ,
Maximizing parallelism and minimizing synchronization with affine transforms, Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '97, pp.201-214, 1997. ,
R-stream: A parametric high level compiler, High Performance Embedded Computing Workshop, 2006. ,
Polylib: A library for manipulating parameterized polyhedra, 1999. ,
Parameterized polyhedra and their vertices, Int. J. Parallel Program, vol.25, issue.6, pp.525-549, 1997. ,
URL : https://hal.archives-ouvertes.fr/inria-00534851
Array-data flow analysis and its use in array privatization, Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '93, pp.2-15, 1993. ,
Lazy array data-flow dependence analysis, Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '94, pp.311-325, 1994. ,
Automatically scheduling halide image processing pipelines, ACM Trans. Graph, vol.35, issue.4, 2016. ,
Optimizing sparse matrix-vector product computations using unroll and jam, Int. J. High Perform. Comput. Appl, vol.18, issue.2, pp.225-236, 2004. ,
Multidimensional intratile parallelization for memory-starved stencil computations, ACM Trans. Parallel Comput, vol.4, issue.3, 2017. ,
Revisiting loop fusion in the polyhedral framework, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pp.233-246, 2014. ,
Polymage: Automatic optimization for image processing pipelines, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pp.429-443, 2015. ,
Improving compiler scalability: Optimizing large programs at small price, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pp.143-152, 2015. ,
Mapping the fdtd application to manycore chip architectures, International Conference on Parallel Processing, pp.309-316, 2009. ,
An optimizing code generator for a class of lattice-boltzmann computations, ACM Trans. Archit. Code Optim, vol.12, issue.2, 2015. ,
Local laplacian filters: Edgeaware image processing with a laplacian pyramid, Commun. ACM, vol.58, issue.3, pp.81-91, 2015. ,
Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision, vol.4, pp.1-73, 2009. ,
The omega test: A fast and practical integer programming algorithm for dependence analysis, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, pp.4-13, 1991. ,
Eliminating false data dependences using the omega test, Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI '92, pp.140-151, 1992. ,
An exact method for analysis of value-based array data dependences, Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pp.546-566, 1994. ,
Nonlinear array dependence analysis, 1994. ,
Static analysis of upper and lower bounds on dependences and parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994. ,
Polyhedralbased data reuse optimization for configurable computing, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pp.29-38, 2013. ,
Generation of efficient nested loops from polyhedra, International Journal of Parallel Programming, vol.28, issue.5, pp.469-498, 2000. ,
Effective automatic computation placement and data allocation for parallelization of regular programs, Proceedings of the 28th ACM International Conference on Supercomputing, ICS '14, pp.13-22, 2014. ,
Distributed memory code generation for mixed irregular/regular computations, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.65-75, 2015. ,
Decoupling algorithms from schedules for easy optimization of image processing pipelines, ACM Trans. Graph, vol.31, issue.4, 2012. ,
Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pp.519-530, 2013. ,
Reduction drawing: Language constructs and polyhedral compilation for reductions on gpu, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT '16, pp.87-97, 2016. ,
Parallelizing while loops for multiprocessor systems, Proceedings of 9th International Parallel Processing Symposium, pp.347-356, 1995. ,
Compile-time composition of run-time data and iteration reorderings, Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI '03, pp.91-102, 2003. ,
Scheduleindependent storage mapping for loops, Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VIII, pp.24-33, 1998. ,
Locality aware concurrent start for stencil applications, Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, pp.157-166, 2015. ,
Set and relation manipulation for the sparse polyhedral framework, Languages and Compilers for Parallel Computing, pp.61-75, 2013. ,
Optimized two-level parallelization for gpu accelerators using the polyhedral model, Proceedings of the 26th International Conference on Compiler Construction, pp.22-33, 2017. ,
An approach for code generation in the sparse polyhedral framework, Parallel Comput, vol.53, issue.C, pp.32-57, 2016. ,
Simplification and runtime resolution of data dependence constraints for loop transformations, Proceedings of the International Conference on Supercomputing, ICS '17, vol.10, pp.1-10, 2017. ,
The polyhedral model of nonlinear loops, ACM Trans. Archit. Code Optim, vol.12, issue.4, p.27, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01244464
Cache accurate time skewing in iterative stencil computations, Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pp.571-581, 2011. ,
Polyhedral optimizations for a data-flow graph language, Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing, vol.9519, pp.57-72, 2016. ,
GRAPHITE Two Years After: First Lessons Learned From Real-World Polyhedral Compilation, GCC Research Opportunities Workshop (GROW'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551516
Polyhedral-model guided loop-nest auto-vectorization, Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT '09, pp.327-337, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00645325
Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra, Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, pp.483-496, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00911888
Polyhedral code generation in the real world, Compiler Construction, pp.185-201, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00001106
Violated dependence analysis, Proceedings of the 20th Annual International Conference on Supercomputing, ICS '06, pp.335-344, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-01257290
José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. Polyhedral parallel code generation for cuda, ACM Trans. Archit. Code Optim, vol.9, issue.4, 2013. ,
Oski: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, vol.16, p.521, 2005. ,
Isl: An integer set library for the polyhedral model, Proceedings of the Third International Congress Conference on Mathematical Software, ICMS'10, pp.299-302, 2010. ,
Polyhedral extraction tool, Second Int. Workshop on Polyhedral Compilation Techniques (IMPACT'12), 2012. ,
Loop and data transformations for sparse matrix code, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pp.521-532, 2015. ,
Automating wavefront parallelization for sparse matrix computations, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, vol.41, pp.1-41, 2016. ,
Nonaffine extensions to polyhedral code generation, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, vol.185, pp.185-185, 2014. ,
Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions, 2018. ,
A loop transformation theory and an algorithm to maximize parallelism, IEEE Trans. Parallel Distrib. Syst, vol.2, issue.4, pp.452-471, 1991. ,
Theory and algorithm for generalized memory partitioning in high-level synthesis, Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA '14, pp.199-208, 2014. ,
Alphaz: A system for design space exploration in the polyhedral model, Languages and Compilers for Parallel Computing, pp.17-31, 2013. ,
Hierarchical overlapped tiling, Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pp.207-218, 2012. ,
Deepdsl: A compilation-based domainspecific language for deep learning, 2017. ,
Improving polyhedral code generation for high-level synthesis, Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS '13, vol.15, pp.1-15, 2013. ,