Parallel birth and death process for cell nuclei extraction in histopathology images, Parallel Processing (ICPP), 2013 42nd International Conference on, pp.429-438, 2013. ,
The bhattacharyya metric as an absolute similarity measure for frequency coded data, vol.34, pp.363-368, 1998. ,
, Generating loops for scanning polyhedra. PRiSM, Versailles University, vol.23, 2002.
Improving Data Locality in Static Control Programs, 2004. ,
Extracting polyhedral representation from high level languages. Tech. rep. Related to the Clan tool, 2008. ,
Openscop: A specification and a library for data exchange in polyhedral compilation tools, 2011. ,
Contributions to high-level program optimization, 2012. ,
Large-scale simulation of elastic wave propagation in heterogeneous media on parallel computers, Cloog: The chunky loop generator, vol.5, pp.85-102, 1991. ,
Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.5, pp.1285-1298, 2017. ,
Bio-molecular dynamics comes of age, Science, vol.271, issue.5251, pp.954-954, 1996. ,
A practical automatic polyhedral parallelizer and locality optimizer, In ACM SIGPLAN Notices, vol.43, pp.101-113, 2008. ,
Adaptive mesh refinement for hyperbolic partial differential equations, Journal of computational Physics, vol.53, issue.3, pp.484-512, 1984. ,
Pluto-an automatic parallelizer and locality optimizer for multicores, 2009. ,
Compiling affine loop nests for distributed-memory parallel architectures, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p.33, 2013. ,
Salinitydriven thermocline transients in a wind-and thermohaline-forced isopycnic coordinate model of the north atlantic, Journal of Physical Oceanography, vol.22, issue.12, pp.1486-1505, 1992. ,
High performance spectral simulation of turbulent flows in massively parallel machines with distributed memory. The International journal of supercomputer applications and high performance computing, vol.9, pp.187-204, 1995. ,
Chill: A framework for composing high-level loop transformations, 2008. ,
Experimenting iterative computations with ordered read-write locks, éditeurs : 18th Euromicro International Conference on Parallel, Distributed and network-based Processing, pp.155-162, 2010. ,
Iterative computations with ordered read-write locks, Journal of Parallel and Distributed Computing, vol.70, issue.5, pp.496-504, 2010. ,
Accelerating fluid registration algorithm on multi-FPGA platforms, Field Programmable Logic and Applications (FPL), 2011 International Conference on, pp.50-57, 2011. ,
A fast double precision cfd code using cuda. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, pp.414-429, 2009. ,
Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures, Maarten PAULIDES et Helmar BURKHART : Manycore stencil computations in hyperthermia applications. Scientific Computing with Multicore and Accelerators, pp.255-277, 2010. ,
Lithographic aerial image simulation with FPGAbased hardware acceleration, Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, pp.67-76, 2008. ,
Cache optimization for structured and unstructured grid multigrid, Electronic Transactions on Numerical Analysis, vol.10, pp.21-40, 2000. ,
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, Computer organization and design: the hardware/software interface. San mateo, CA: M organ Kaufmann Publishers, 1:998, vol.51, p.122, 1999. ,
Generating efficient data movement code for heterogeneous architectures with distributed-memory, Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on, pp.375-386, 2013. ,
Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations, Journal of Systems and Software, vol.125, pp.389-400, 2017. ,
The design and implementation of fftw3, Proceedings of the IEEE, vol.93, issue.2, pp.216-231, 2005. ,
Cache-oblivious algorithms, Foundations of Computer Science, 1999. 40th Annual Symposium on, pp.285-297, 1999. ,
Manticore: A heterogeneous parallel language, Proceedings of the 2007 workshop on Declarative aspects of multicore programming, pp.37-44, 2007. ,
Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations, Proc. of the 19th ACM International Conference on Supercomputing (ICS05), 2005. ,
Split tiling for gpus: automatic parallelization using trapezoidal tiles, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, vol.14, pp.24-31, 2003. ,
Relaxed synchronization with ordered read-write locks, éditeurs : Euro-Par 2011: Parallel Processing Workshops, vol.7155, pp.387-397, 2011. ,
Fully-abstracted affinity optimization for task-based models ,
Optimizing locality by topology-aware placement for a task based programming model, Cluster Computing (CLUSTER), 2016 IEEE International Conference on, pp.164-165, 2016. ,
Improving the arithmetic intensity of multigrid with the help of polynomial smoothers, Numerical Linear Algebra with Applications, vol.19, issue.2, pp.253-267, 2012. ,
Resource Centered Computing delivering high parallel performance, Heterogeneity in Computing Workshop (HCW 2014), workshop of 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2014), 2014. ,
Loop transformation recipes for code generation and auto-tuning, International Workshop on Languages and Compilers for Parallel Computing, pp.50-64, 2009. ,
Vmd: visual molecular dynamics, Journal of molecular graphics, vol.14, issue.1, pp.33-38, 1996. ,
, A domain-specific language and compiler for stencil computations on shortvector simd and gpu architectures
Data layout transformation for stencil computations on short-vector simd architectures, Proceedings of the 27th international ACM conference on International conference on supercomputing, vol.42, pp.1-12, 2007. ,
Molecular-dynamics study of mechanical deformation in nano-crystalline aluminum. Metallurgical and materials transactions A, vol.35, pp.2719-2723, 2004. ,
Dimepack-a cache-optimized multigrid library, PROC. OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICA-TIONS (PDPTA 2001), VOLUME I. Citeseer, 2001. ,
Exastencils: advanced stencil-code engineering, European Conference on Parallel Processing, pp.553-564, 2014. ,
The objective caml system release 3.11. Documentation and user's manual. INRIA, 2008. ,
Polylib: A library for manipulating parameterized polyhedra, 1999. ,
Automatic tiling of iterative stencil loops, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.26, issue.6, pp.975-1028, 2004. ,
, Methods of theoretical physics, 1946.
, Paulius MICIKEVICIUS : 3d finite difference computation on gpus using cuda, Proceedings of 2nd workshop on general purpose processing on graphics processing units, pp.79-84, 2009.
Paragent: A domain-specific semi-automatic parallelization tool, International Conference on High-Performance Computing, pp.141-148, 2000. ,
Physis: an implicitly parallel programming model for stencil computations on large-scale gpu-accelerated supercomputers, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p.11, 2011. ,
A performance study for iterative stencil loops on gpus with ghost zone optimizations, International Journal of Parallel Programming, vol.39, issue.1, pp.115-142, 2011. ,
Polymage: Automatic optimization for image processing pipelines, In ACM SIGARCH Computer Architecture News, vol.43, pp.429-443, 2015. ,
Large-scale simulation of polymer electrolyte fuel cells by parallel computing, Chemical Engineering Science, vol.59, issue.16, pp.3331-3343, 2004. ,
Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers, High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for, vol.83, pp.1-13, 1994. ,
, CUDA NVIDIA : Programming guide, 2010.
Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the International Symposium on Code Generation and Optimization, pp.319-332, 2006. ,
Implementing the himeno benchmark with cuda on gpu clusters, Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics, vol.29, issue.7, pp.845-854, 2013. ,
Pskel: A stencil programming framework for cpu-gpu systems, Concurrency and Computation: Practice and Experience, vol.27, issue.17, pp.4938-4953, 2015. ,
, The art of molecular dynamics simulation, 2004.
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, ACM SIGPLAN Notices, vol.48, issue.6, pp.519-530, 2013. ,
Parameterized tiled loops for free, ACM SIGPLAN Notices, vol.42, pp.405-414, 2007. ,
Tiling optimizations for 3d scientific computations, Proceedings of the 2000 ACM/IEEE conference on Supercomputing, p.32, 2000. ,
Cache-efficient multigrid algorithms. The International Journal of High Performance Computing Applications, vol.18, pp.115-133, 2004. ,
Automatic Code Generation for Iterative Multi-dimensional Stencil Computations, Anne BENOÎT, édi-teur : High Performance Computing, Data, and Analitics, p.2016 ,
Resource-Centered Distributed Processing of Large Histopathology Images, 19th IEEE International Conference on Computational Science and Engineering, 2016. ,
Opencl: A parallel programming standard for heterogeneous computing systems, Computing in science & engineering, vol.12, issue.3, pp.66-73, 2010. ,
A scalable auto-tuning framework for compiler optimization, New tiling techniques to improve cache temporal locality, vol.34, pp.1-12, 1999. ,
The pochoir stencil compiler, Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures, pp.117-128, 2002. ,
Coding stencil computations using the pochoir stencil-specification language, Poster session presented at the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011. ,
Auto-tuning for energy usage in scientific applications, European Conference on Parallel Processing, pp.178-187, 2011. ,
The cimg library: http://cimg. sourceforge. net. The C++ Template Image Processing Library, 2004. ,
The finite-difference timedomain method for numerical modeling of electromagnetic wave interactions, Electromagnetics, vol.10, issue.1-2, pp.105-126, 1990. ,
Mint: realizing cuda performance in 3d stencil methods with annotated c, Proceedings of the international conference on Supercomputing, pp.214-224, 2011. ,
Polyhedral parallel code generation for cuda, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, issue.4, p.54, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786677
Oski: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, vol.16, p.521, 2005. ,
Lattice boltzmann simulation optimization on leading multicore platforms, International Congress on Mathematical Software, pp.1-14, 2008. ,
Automatically tuned linear algebra software, Supercomputing, 1998. SC98. IEEE/ACM Conference on, pp.38-38, 1998. ,
,
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization, Computer Software and Applications Conference, vol.1, pp.579-586, 2009. ,
Program development by stepwise refinement, Communications of the ACM, vol.14, issue.4, pp.221-227, 1971. ,
The potential impact of male circumcision on hiv in sub-saharan africa, PLoS medicine, vol.3, issue.7, p.262, 2006. ,
More iteration space tiling, Proceedings of the 1989 ,
, ACM/IEEE conference on Supercomputing, pp.655-664, 1989.
Using time skewing to eliminate idle time due to memory bandwidth and network limitations, Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pp.171-180, 2000. ,
Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system, Journal of Parallel and Distributed Computing, vol.73, issue.12, pp.1592-1604, 2013. ,
, Loop tiling for parallelism, vol.575, 2012.
Hierarchical overlapped tiling, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.207-218, 2012. ,
Introducing a parallel cache oblivious blocking approach for the lattice boltzmann method, Proceedings of the Tenth International Symposium on Code Generation and Optimization, vol.8, pp.179-188, 2008. ,