F. Moore-'s and . Law, Hot and slow DRAM is a major roadblock to exascale and beyond. EXTREMETECH. URL http://www.extremetech.com/computing/ 185797-forget-moores-law-hot-and-slow-dram-is-a-major-roadblock- to-exascale-and-beyond

. Intel, Xeon E5s Aimed Squarely at HPC. HPCWire. URL http

. Openacc, Directives for Accelerators, version 2.0, 2013

P. Yuva and -. , India's fastest supercomputer, 2013. URL http

A. Fpgas, URL https, 2015.

C. C-programming-guide and . Nvidia, URL https, 2015.

. Opencl, The open standard for parallel programming of heterogeneous systems, version 2.0. Khronos, 2015. URL https

R. Abdelkhalek, Accélération matérielle pour l'imagerie sismique : modélisation, migration et interprétation, 2013.

R. Abdelkhalek, H. Calandra, O. Coulaud, G. Latu, and J. Roman, Fast seismic modeling and reverse rime migration on a graphics processing unit Cluster, Concurrency and Computation: Practice and Experience, pp.739-750, 2012.

K. Aki and P. G. Richard, Quantitative Seismology : Second Edition, 2002.

F. Alted, Why Modern CPUs Are Starving and What Can Be Done about It, Computing in Science & Engineering, vol.12, issue.2, 2010.
DOI : 10.1109/MCSE.2010.51

Z. Alterman and F. C. , Propagation of elastic waves in layered media by finite difference methods, pp.367-398, 1968.

J. E. Anderson, L. Tan, and D. Wang, Time-Reversal Methods for RTM and FWI. Society of Exploration Geophysicists, 2011.

J. E. Anderson, L. Tan, and D. Wang, Time-reversal checkpointing methods for RTM and FWI, GEOPHYSICS, vol.77, issue.4, p.93, 2012.
DOI : 10.1190/geo2011-0114.1

M. Araya-polo, F. Rubio, R. De-la-cruz, M. Hanzich, J. M. Cela et al., High-Performance Seismic Acoustic Imaging by Reverse-Time Migration on the Cell, Architecture. ISCA2008 -WCSA2008 / Scientific Programming Special Issue on High Performance Computing on Cell B.E. Processors, 2008.

M. Araya-polo, J. Cabezas, M. Hanzich, M. Pericas, F. Rubio et al., Assessing Accelerator-Based HPC Reverse Time Migration, IEEE Transactions on Parallel and Distributed Systems, vol.22, issue.1, pp.147-162, 2011.
DOI : 10.1109/TPDS.2010.144

V. Arslan, J. Y. Blanc, M. Tchiboukdjian, P. Thierry, and G. Thomas-collignon, Design and Performance of an Intel Xeon Phi based Cluster for Reverse Time Migration, EAGE Workshop on High Performance Computing for Upstream, 2014.
DOI : 10.3997/2214-4609.20141909

C. Baldassari, Modelling and numerical simulation for land migration by wave equation, 2009.
URL : https://hal.archives-ouvertes.fr/tel-00472810

E. Baysal, D. Kosloff, and J. Sherwood, Reverse time migration, GEOPHYSICS, vol.48, issue.11, pp.481514-1524, 1983.
DOI : 10.1190/1.1441434

H. Ali, Three dimensional visco-acoustic frequency-domain full waveform inversion, 2009.

J. Berenger, A perfectly matched layer for the absorption of electromagnetic waves, Journal of Computational Physics, vol.114, issue.2, pp.185-200, 1994.
DOI : 10.1006/jcph.1994.1159

S. Bihan, G. Moulard, R. Dolbeau, H. Calandra, and R. Abdelkhalek, Directivebased Heterogeneous Programming A GPU-Accelerated RTM Use Case, 2009.

R. Bording and L. Lines, Seismic Modeling and Imaging with the Complete Wave Equation. Society of Exploration Geophysicists, 1997.

S. Breuer, M. Steuwer, and S. Gorlatch, Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems, 2014.

J. Brittan, J. Bai, H. Delome, C. Wang, and D. Yingst, Full waveform inversion ? the state of the art. SEG Technical Program Expanded Abstracts, pp.75-81, 2013.

R. J. Brown, R. R. Stewart, J. E. Gaiser, and D. C. Lawton, An acquisition polarity standard for multicomponent seismic data, 2000.

C. Bryan, OpenCL optimization case study support vector machine training, 2011.

I. Buck, The Evolution of GPUs for General Purpose Computing. NVIDIA, 2010. URL http

H. Calandra, La puissance de calcul au service de la géophysique, pp.85-89, 2006.

G. Calandrini, A. Gardel, I. Bravo, and P. Revenga, Power Measurment Methods for Energy Efficient Applications. sensors, 2013.

J. Carcione, G. Herman, and A. Ten-kroode, Seismic modeling, GEOPHYSICS, vol.67, issue.4, pp.1304-1325, 2002.
DOI : 10.1190/1.1500393

W. Chen, P. Kosmas, M. Leeser, and C. Rappaport, An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm, Proceeding of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays , FPGA '04, pp.213-222, 2004.
DOI : 10.1145/968280.968311

A. J. Chorin, Numerical solution of the Navier-Stokes equations, Mathematics of Computation, vol.22, issue.104, pp.745-762, 1968.
DOI : 10.1090/S0025-5718-1968-0242392-2

G. Chow, A. Tse, Q. Jin, W. Luk, P. Leong et al., A mixed precision Monte Carlo methodology for reconfigurable accelerator systems, Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, 2012.
DOI : 10.1145/2145694.2145705

M. Christen, S. Olaf, P. Messmer, E. Neufeld, and H. Burkhart, Accelerating Stencil-Based Computations by Increased Temporal Locality on Modern Multiand Many-Core Architectures, Proc. of First International Workshop on New Frontiers in High-performance and Hardware-aware Computing, 2008.

C. Chu and P. Stoffa, Implicit finite-difference simulations of seismic wave propagation, GEOPHYSICS, vol.77, issue.2, pp.57-67, 2012.
DOI : 10.1190/geo2011-0180.1

H. Chuan, Z. Wei, and L. Mi, Time Domain Numerical Simulation for Transient Waves on Reconfigurable Coprocessor Platform, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05), pp.127-136, 2005.
DOI : 10.1109/FCCM.2005.65

C. Chunlei, P. L. Stoffa, and S. Roustam, 3D Seismic Modeling And Reverse- Time Migration With the Parallel Fourier Method Using Non-blocking Collective Communications, 2009.

J. F. Claerbout, TOWARD A UNIFIED THEORY OF REFLECTOR MAPPING, GEOPHYSICS, vol.36, issue.3, pp.467-481, 1971.
DOI : 10.1190/1.1440185

J. F. Claerbout, Fundamentals of Geophysical Data Processing, 1976.

J. F. Claerbout, Imaging the Earth's Interior (IEI), 1985.

R. G. Clapp, Reverse time migration with random boundaries, SEG Technical Program Expanded Abstracts 2009, pp.2809-2813, 2009.
DOI : 10.1190/1.3255432

R. G. Clapp, Reverse time migration: Saving the boundaries, 2009.

R. G. Clapp, F. Haohuan, and O. Lindtjorn, Selecting the right hardware for reverse time migration, The Leading Edge, vol.29, issue.1, 2010.
DOI : 10.1190/1.3284053

G. Cocco and A. Cisternino, Device specialization in heterogeneous multi-GPU environments, 2012 Imperial College Computing Student Workshop, 2012.

J. A. Coffeen, Seismic Exploration Fundamentals. Pennwell Books, 1986.

M. Commer and G. Newman, A parallel finite???difference approach for 3D transient electromagnetic modeling with galvanic sources, GEOPHYSICS, vol.69, issue.5, pp.1192-1202, 2004.
DOI : 10.1190/1.1801936

C. Couder-castañeda, C. Ortiz-alemán, M. Gabriel, M. Orozco-del-castillo, and M. Nava-flores, TESLA GPUs versus MPI with OpenMP for the ForwardModeling of Gravity and Gravity Gradient of Large Prisms Ensemble, Journal of Applied Mathematics, 2013.

R. Courant, K. Friedrichs, and H. Lewy, On the Partial Difference Equations of Mathematical Physics, IBM Journal of Research and Development, vol.11, issue.2, pp.215-234, 1967.
DOI : 10.1147/rd.112.0215

M. Daga, A. M. Aji, and W. Feng, On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing, 2011 Symposium on Application Accelerators in High-Performance Computing, 2011.
DOI : 10.1109/SAAHPC.2011.29

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter et al., Stencil computation optimization and auto-tuning on stateof-the-art multicore architectures, High Performance Computing, Networking, Storage and Analysis SC 2008. International Conference for, pp.1-12, 2008.

K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf et al., Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors, SIAM Review, vol.51, issue.1, pp.129-159, 2009.
DOI : 10.1137/070693199

K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker et al., Auto-tuning the 27-point stencil for multicore, Proc. iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning, 2009.

M. C. Delorme, T. S. Abdelrahman, and C. Zhao, Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit, 2013 42nd International Conference on Parallel Processing, 2013.
DOI : 10.1109/ICPP.2013.43

D. A. Donzis and K. Aditya, Asynchronous finite-difference schemes for partial differential equations, Journal of Computational Physics, vol.274, pp.370-392, 2014.
DOI : 10.1016/j.jcp.2014.06.017

N. A. Douglas, Stability, consistency, and convergence of numerical discretizations, 2014.

G. G. Drijkoningen, Exploration seismics. URL http://geodus1.ta.tudelft

J. P. Durbano and F. E. Ortiz, FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp.156-163, 2004.
DOI : 10.1109/FCCM.2004.37

H. Dursun, K. Nomura, L. Peng, R. Seymour, W. Wang et al., In-Core Optimization of High-Order Stencil Computations, pp.533-538, 2009.

E. Dussaud, W. W. Symes, and P. Williamson, Computational strategies for reverse???time migration, SEG Technical Program Expanded Abstracts 2008, pp.2267-2271, 2008.
DOI : 10.1190/1.3059336

C. Entreprise, HMPP Workbench, a directive-based compiler for hybrid computing, 2007.

A. Farjallah, Preparing Depth Imaging Applications for Exascale Challenges and Impacts, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01165085

R. P. Fletcher and O. A. Robertsson, Time-varying boundary conditions in simulation of seismic wave propagation, GEOPHYSICS, vol.76, issue.1, pp.1-6, 2011.
DOI : 10.1190/1.3511526

D. Foltinek, D. Eaton, J. Mahovsky, P. Moghaddam, and R. Mcgarry, Industrialscale reverse time migration on gpu hardware, SEG Annual Meeting, 2009.

H. Fu and R. G. Clapp, Eliminating the memory bottleneck, Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '11, pp.65-74, 2011.
DOI : 10.1145/1950413.1950429

J. Gazdag, Wave equation migration with the phase???shift method, GEOPHYSICS, vol.43, issue.7, pp.1342-1351, 1978.
DOI : 10.1190/1.1440899

S. Ghosh, S. Chandrasekaran, and B. Chapman, Energy Analysis of Parallel Scientific Kernels on Multiple GPUs, 2012 Symposium on Application Accelerators in High Performance Computing, 2012.
DOI : 10.1109/SAAHPC.2012.17

S. Ghosh, T. Liao, H. Calandra, and B. M. Chapman, Performance of CPU/GPU compiler directives on ISO/TTI kernels, Computing, vol.21, issue.3, pp.961149-1162, 2013.
DOI : 10.1007/s00607-013-0367-4

S. H. Gray, J. Etgen, J. Dellinger, and D. Whitmore, Seismic migration problems and solutions, GEOPHYSICS, vol.66, issue.5, p.1640, 2001.
DOI : 10.1190/1.1487107

A. Griewank, Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation, Optimization Methods and Software, vol.1, issue.1, 1992.
DOI : 10.1080/10556789208805505

A. Griewank and A. Walther, Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation, ACM Transactions on Mathematical Software, vol.26, issue.1, pp.19-45, 2000.
DOI : 10.1145/347837.347846

P. The and . Group, PGI Fortran and C Accelarator Programming Model, 2010.

H. Guan, Z. Li, B. Wang, and Y. Kim, A Multi-Step Approach for Efficient Reverse-Time Migration. Expanded Abstracts of 78th Annuel SEG Mtg, pp.2341-2345, 2008.

A. Guitton, Shot-profile migration of multiple reflections. SEG Technical Program Expanded Abstracts, pp.1296-1299, 2002.

H. Chung-hsing and S. W. Poole, Power measurement for high performance computing: State of the art, 2011 International Green Computing Conference and Workshops, pp.1-6, 2011.
DOI : 10.1109/IGCC.2011.6008596

E. Hager, Full Azimuth Seismic Acquisition with Coil Shooting, th Biennial International Conference & Exposition on Petroleum Geophysics, 2010.

G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, 2011.
DOI : 10.1201/EBK1439811924

G. Hager, H. Stengel, G. Wellein, J. Treibig, M. Wittmann et al., MPI+OpenMP hybrid computing (on modern multicore systems), 39th Speedup Workshop on High-Performance Computing, 2010.

G. Hager, G. Schubert, T. Schoenemeyer, and G. Wellein, Prospects for Truly Asynchronous Communication with Pure MPI and Hybrid MPI/ OpenMP on Current Supercomputing Platforms, 2011.

D. Hale, Migration by the Kirchhoff, slant stack and Gaussian beam methods, p.126, 1999.

B. Hamilton and C. Webb, Room acoustics modelling using GPU-accelerated finite difference and finite volume methods on a face-centered cubic grid, 2013.

T. D. Han and T. S. Abdelrahman, CUDA, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp.78-90, 2011.
DOI : 10.1145/1513895.1513902

S. Hauck and A. Dehon, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, 2007.

J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach, 2011.

L. I. Hong-wei, L. I. Bo, L. I. Hong, T. Xiao-long, and L. I. Qin, The algorithm of high order finite difference pre-stack reverse time migration and gpu implementation, Chinese Journal of Geophysics, vol.53, issue.4, pp.600-610, 2010.

L. Hongwei, D. Renwei, L. Lu, and L. Hong, Wavefield reconstruction methods for reverse time migration, Journal of Geophysics and Engineering, vol.10, issue.1, p.15004, 2013.

C. Hsu, W. Feng, and J. S. Archuleta, Towards efficient supercomputing, Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering, ICPE '12, p.230, 2005.
DOI : 10.1145/2188286.2188309

S. Huang, S. Xiao, and W. Feng, On the energy efficiency of graphics processing units for scientific computing, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5160980

D. Jacobsen, J. Thibault, and I. Senocak, An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters, 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, 2010.
DOI : 10.2514/6.2010-522

Z. Jiang, K. Bonham, J. C. Bancroft, and L. R. Lines, Overcoming computational cost problems of reverse-time migration, GeoCanada, 2010.

G. Jin, T. Endo, and S. Matsuoka, A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1080-1087, 2013.
DOI : 10.1109/IPDPSW.2013.58

G. Jin, T. Endo, and S. Matsuoka, A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1080-1087, 2013.
DOI : 10.1109/IPDPSW.2013.58

J. Jossey and A. N. Hirani, Equivalence theorems in numerical analysis: integration , differentiation and interpolation, 2000.

B. Kaelin and A. Guitton, Illumination effects in reverse time migration, EAGE, 2007.

T. H. Kaiser and S. B. Baden, Overlapping Communication and Computation with OpenMP and MPI, Scientific Programming, vol.9, issue.2-3, pp.73-81, 2001.
DOI : 10.1155/2001/712152

S. Kane, Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media, IEEE Transactions on Antennas and Propagation, vol.14, issue.3, pp.302-307, 1966.
DOI : 10.1109/TAP.1966.1138693

P. Kearey, M. Brooks, and I. Hill, An Introduction to Geophysical Exploration, Blackwell Science, 2002.

K. Kelly, R. Ward, S. Treitel, and R. Alford, SYNTHETIC SEISMOGRAMS: A FINITE ???DIFFERENCE APPROACH, GEOPHYSICS, vol.41, issue.1, pp.2-27, 1976.
DOI : 10.1190/1.1440605

K. Group, The OpenGL Specification version 4, 2012.

Y. Kim, Y. Cho, U. Jang, and C. Shin, Acceleration of stable TTI P-wave reverse-time migration with GPUs, Computers & Geosciences, vol.52, pp.204-217, 2013.
DOI : 10.1016/j.cageo.2012.10.013

]. M. Klemm, 23 tips for performance tuning with the Intel MPI Library URL https://sharepoint.campus.rwth-aachen, 2010.

S. Kolia¨?kolia¨?, Z. Bendifallah, M. Tribalat, C. Valensi, J. Acquaviva et al., Quantifying Performance Bottleneck Cost Through Differential Analysis, Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pp.263-272, 2013.

D. Komatitsch and R. Martin, An unsplit convolutional perfectly matched layer improved at grazing incidence for the seismic wave equation, GEOPHYSICS, vol.72, issue.5, pp.155-167, 2007.
DOI : 10.1190/1.2757586

URL : https://hal.archives-ouvertes.fr/inria-00528418

D. Kosloff and E. Baysal, Forward modeling by a Fourier method, GEOPHYSICS, vol.47, issue.10, pp.471402-1412, 1982.
DOI : 10.1190/1.1441288

R. Kosloff and D. Kosloff, Absorbing boundaries for wave propagation problems, Journal of Computational Physics, vol.63, issue.2, pp.363-376, 1986.
DOI : 10.1016/0021-9991(86)90199-3

H. Kronawitter, S. Stengel, G. Hager, and C. Lengauer, Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model, Parallel Processing Letters, vol.24, issue.03, p.2014
DOI : 10.1142/S0129626414410047

S. Lee and R. Eigenmann, OpenMPC: Extended OpenMP Programming and Tuning for GPUs, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.

A. Lemmer and R. Hilfer, Parallel domain decomposition method with non-blocking communication for flow through porous media, Journal of Computational Physics, vol.281, pp.970-981, 2015.
DOI : 10.1016/j.jcp.2014.08.032

W. Liang, Y. Wang, and C. Yang, Determining finite difference weights for the acoustic wave equation by a new dispersion-relationship-preserving method, Geophysical Prospecting, vol.76, issue.3, pp.11-22, 2015.
DOI : 10.1111/1365-2478.12160

L. R. Lines, R. Slawinski, and R. P. Bording, A recipe for stability of finite???difference wave???equation computations, GEOPHYSICS, vol.64, issue.3, pp.967-969, 1999.
DOI : 10.1190/1.1444605

H. Liu, B. Li, H. Liu, X. Tong, Q. Liu et al., The issues of prestack reverse time migration and solutions with Graphic Processing Unit implementation, Geophysical Prospecting, vol.76, issue.no.7, pp.906-918, 2012.
DOI : 10.1111/j.1365-2478.2011.01032.x

L. T. Lkelle and L. Amundsen, Introduction to Petroleum Seismology (Investigations in Geophysics) Society Of Exploration Geophysicists, 2005.

S. A. Long, R. Van-borselen, and L. Fountain, Surface-Related Multiple Elimination ? Applications to an offshore Australia data set, ASEG Extended Abstracts, vol.2001, issue.1, pp.1-4, 2001.
DOI : 10.1071/ASEG2001ab077

L. Lu and K. Magerlein, Multi-level Parallel Computing of Reverse Time Migration for Seismic Imaging on Blue Gene, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.291-292, 2013.

E. Lusk and A. Chan, Early Experiments with the OpenMP/MPI Hybrid Programming Model, OpenMP in a New Era of Parallelism, pp.36-47, 2008.
DOI : 10.1007/978-3-540-79561-2_4

T. Lutz, C. Fensch, and M. Cole, PARTANS, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, 2013.
DOI : 10.1145/2400682.2400718

T. M. Malas, G. Hager, H. Ltaief, H. Stengel, G. Wellein et al., Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates, SIAM Journal on Scientific Computing, vol.37, issue.4, 1410.
DOI : 10.1137/140991133

N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka, Physis, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-1112, 2011.
DOI : 10.1145/2063384.2063398

G. A. Mcmechan, MIGRATION BY EXTRAPOLATION OF TIME-DEPENDENT BOUNDARY VALUES*, Geophysical Prospecting, vol.71, issue.3, pp.413-420, 1983.
DOI : 10.1190/1.1440826

K. C. Meza-fajardo and A. S. Papageorgiou, A Nonconvolutional, Split-Field, Perfectly Matched Layer for Wave Propagation in Isotropic and Anisotropic Elastic Media: Stability Analysis, Bulletin of the Seismological Society of America, vol.98, issue.4, pp.1811-1836, 2008.
DOI : 10.1785/0120070223

D. Michéa and D. Komatitsch, Accelerating a 3D finite-difference wave propagation code using GPU graphics cards, Geophys. J. Int, vol.182, issue.1, pp.389-402, 2010.

P. Micikevicius, 3D finite difference computation on GPUs using CUDA, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp.79-84, 2009.
DOI : 10.1145/1513895.1513905

P. Moczo, J. O. Robertsson, and L. Eisner, The Finite-Difference Time-Domain Method for Modeling of Seismic Wave Propagation, Advances in Geophysics, vol.48, pp.421-516, 2007.
DOI : 10.1016/S0065-2687(06)48008-0

W. A. Mulder and R. Plessix, A comparison between one???way and two???way wave???equation migration, GEOPHYSICS, vol.69, issue.6, p.69, 2004.
DOI : 10.1190/1.1836822

C. Märtin, Post-Dennard Scaling and the final Years of Moore ' s Law Consequences for the Evolution of Multicore-Architectures, Informatik und Interaktive Systeme, 2014.

R. Nath, S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra, Proceedings of the 9th international conference on High performance computing for computational science, 2011.
DOI : 10.1007/978-3-642-01970-8_89

A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey, 3.5d blocking optimization for stencil computations on modern CPUs and GPUs, Proc. of the 2010 ACM/IEEE Int'l Conf. for High Performance Computing, Networking, Storage and Analysis, pp.1-13, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00865020

T. Okamoto, H. Takenaka, T. Nakamura, and T. Aoki, Accelerating Large-Scale Simulation of Seismic Wave Propagation by Multi-GPUs and Three-Dimensional Domain Decomposition, GPU Solutions to Multi-scale Problems in Science and Engineering, pp.375-389, 2013.

S. Operto, J. Virieux, and A. Ribodetti, Finite-difference frequency-domain modeling of viscoacoustic wave propagation in 2D tilted transversely isotropic (TTI) media, GEOPHYSICS, vol.74, issue.5, p.74, 2009.
DOI : 10.1190/1.3157243

URL : https://hal.archives-ouvertes.fr/hal-00413561

J. Panetta, T. Teixeira, P. R. De-souza-filho, C. A. Da-cunha-finho, D. Sotelo et al., Accelerating Kirchhoff Migration by CPU and GPU Cooperation, 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp.26-32, 2009.
DOI : 10.1109/SBAC-PAD.2009.29

]. I. Panourgias, NUMA effects on multicore, multi socket systems, 2011.

A. Pedram, R. A. Van-de-geijn, and A. Gerstlauer, Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures, IEEE Transactions on Computers, vol.61, issue.12, 2012.
DOI : 10.1109/TC.2012.132

M. P. Perrone, L. Lu, L. Liu, K. Magerlein, K. Changhoan et al., Fast Scalable Reverse Time Migration Seismic Imaging on Blue Gene, 2011.

M. Peter, O. A. Johan, and E. Leo, The Finite-Difference Time-Domain Method for Modeling of Seismic Wave Propagation, Advances in Wave Propagation in Heterogenous Earth, 2007.

P. Multi, Azimuth 3-D Surface-Related Multiple Elimination ? Application to Offshore Nile Delta, 2009.

S. Phadke, D. Bhardwaj, and S. Yerneni, 3D Seismic Modeling in a Message Passing Environment

R. Plessix, A review of the adjoint-state method for computing the gradient of a functional with geophysical applications, Geophysical Journal International, vol.167, issue.2, pp.495-503, 2006.
DOI : 10.1111/j.1365-246X.2006.02978.x

R. G. Pratt, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model, GEOPHYSICS, vol.64, issue.3, pp.888-901, 1999.
DOI : 10.1190/1.1444597

R. Rastogi, A. Srivastava, K. Sirasala, H. Chavhan, and K. Khonde, Experience of Porting and Optimization of Seismic Modelling on Multi and Many Cores of Hybrid Computing Cluster, 77th EAGE Conference and Exhibition 2015, 2015.
DOI : 10.3997/2214-4609.201413106

G. Ritter and K. Waddell, Resolving Complex Salt Geometry: Iterative Salt Imaging and Interpretation, 2012.

G. Rivera and C. Tseng, Tiling Optimizations for 3D Scientific Computations, ACM/IEEE SC 2000 Conference (SC'00), 2000.
DOI : 10.1109/SC.2000.10015

E. Robein, Vitesses et techniques d'imagerie en sismique réflexion, 1999.

E. Robein, Seismic Imaging: A Review of the Techniques, their Principles, Merits and Limitations, 2010.
DOI : 10.3997/9789073781788

A. J. Roden and D. Stephen, Convolution PML (CPML): An efficient FDTD implementation of the CFS?PML for arbitrary media. Microwave and Optical Technology Letters, pp.334-339, 2000.

Y. Ruan, V. S. Pai, E. Nahum, and J. M. Tracey, Evaluating the Impact of Simultaneous Multithreading on Network Servers Using Real Hardware, Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '05, pp.315-326, 2005.

I. Said, H. Calandra, and T. Liao, FPGA based technology evaluation: a case study of seismic depth imaging application, 2010.

W. A. Schneider, INTEGRAL FORMULATION FOR MIGRATION IN TWO AND THREE DIMENSIONS, GEOPHYSICS, vol.43, issue.1, pp.49-76, 1978.
DOI : 10.1190/1.1440828

G. Schubert, G. Hager, H. Fehske, and G. Wellein, Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1751-1758, 2011.
DOI : 10.1109/IPDPS.2011.332

G. T. Schuster, Basics of Seismic Imaging, 2010.

G. T. Schuster, Basics of Seismic Wave Theory, 2007.

. Seg, The 2004 BP Velocity-Analysis Benchmark URL http://software. seg.org/datasets, 2004.

R. L. Sengbush, Seismic Exploration Methods
DOI : 10.1007/978-94-011-6397-2

. Sercel, What on Earth is Geophysics?, 2013. URL http://www.sercel.com/ about/Pages/what-is-geophysics

P. Sguazzero and J. Gazdag, Migration of Seismic Data, 1984.

M. Shafiq, M. Pericas, R. De-la-cruz, M. Araya-polo, N. Navarro et al., Exploiting memory customization in FPGA for 3D stencil computations, 2009 International Conference on Field-Programmable Technology, pp.38-45, 2009.
DOI : 10.1109/FPT.2009.5377644

P. M. Shearer, Introduction to Sismology, 2009.

T. Shimokawabe, T. Aoki, T. Takaki, T. Endo, A. Yamanaka et al., Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-311, 2011.
DOI : 10.1145/2063384.2063388

M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa, MT-MPI, Proceedings of the 28th ACM international conference on Supercomputing, ICS '14, pp.125-134, 2014.
DOI : 10.1145/2597652.2597658

M. R. Simmons, The world's giant oilfields, SIMMONS & COM- PANY INTERNATIONAL, 2001.

H. Stengel, J. Treibig, G. Hager, and G. Wellein, Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model, Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, 1410.
DOI : 10.1145/2751205.2751240

R. H. Stolt, MIGRATION BY FOURIER TRANSFORM, GEOPHYSICS, vol.43, issue.1, pp.23-48, 1978.
DOI : 10.1190/1.1440826

R. Strzodka, M. Shaheen, D. Pajak, and H. Seidel, Cache Accurate Time Skewing in Iterative Stencil Computations, 2011 International Conference on Parallel Processing, pp.571-581, 2011.
DOI : 10.1109/ICPP.2011.47

R. Suda, D. Qi, and R. , Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, 2009.
DOI : 10.1109/PDCAT.2009.65

M. Sugawara, S. Hirasawa, K. Komatsu, H. Takizawa, and H. Kobayashi, A Comparison of Performance Tunabilities between OpenCL and OpenACC, 2013 IEEE 7th International Symposium on Embedded Multicore Socs, pp.147-152, 2013.
DOI : 10.1109/MCSoC.2013.31

S. Sur, H. Jin, L. Chai, and D. K. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.32-39, 2006.
DOI : 10.1145/1122971.1122978

W. W. Symes, Reverse time migration with optimal checkpointing, GEOPHYSICS, vol.72, issue.5, pp.213-221, 2007.
DOI : 10.1190/1.2742686

R. Thakur and W. Gropp, Test suite for evaluating performance of multithreaded MPI communication, Parallel Computing, vol.35, issue.12, pp.608-617, 2008.
DOI : 10.1016/j.parco.2008.12.013

J. I. Toivanen, T. P. Stefanski, N. Kuster, and N. Chavannes, COMPARISON OF CPML IMPLEMENTATIONS FOR THE GPU-ACCELERATED FDTD SOLVER, Progress in Electromagnetics Research M, pp.61-75, 2011.
DOI : 10.2528/PIERM11061002

T. Udagawa and M. Sekijima, The Power Efficiency of GPUs in Multi Nodes Environment with Molecular Dynamics, 2011 40th International Conference on Parallel Processing Workshops, 2011.
DOI : 10.1109/ICPPW.2011.43

O. Villa, D. R. Johnson, M. O. Connor, E. Bolotin, D. Nellans et al., Scaling the Power Wall: A Path to Exascale, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.
DOI : 10.1109/SC.2014.73

A. Villarreal and J. A. Scales, Distributed three-dimensional finite-diffrence modeling of wave propagation in acoustic media, COMPUTERS IN PHYSICS, 1997.

J. Virieux and S. Operto, An overview of full-waveform inversion in exploration geophysics, GEOPHYSICS, vol.74, issue.6, p.1, 2009.
DOI : 10.1190/1.3238367

URL : https://hal.archives-ouvertes.fr/hal-00457989

J. Virieux, H. Calandra, and R. ´. Plessix, A review of the spectral, pseudo-spectral, finite-difference and finite-element modelling techniques for geophysical imaging, Geophysical Prospecting, vol.136, issue.5, pp.794-813, 2011.
DOI : 10.1111/j.1365-2478.2011.00967.x

URL : https://hal.archives-ouvertes.fr/insu-00681794

V. Volkov, Better Performance at Lower Occupancy, GPU Technology Conference, 2010.

V. Volkov, Better performance at lower occupancy, GTC, 2010.

V. Volkov and J. W. , Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.8, 2008.
DOI : 10.1109/SC.2008.5214359

S. Weijia and F. Li-yun, Two effective approaches to reduce data storage in reverse time migration, Computers & Geosciences, vol.56, issue.0, pp.69-75, 2013.

N. Whitmore, Iterative depth migration by backward time propagation, SEG Technical Program Expanded Abstracts 1983, 1983.
DOI : 10.1190/1.1893867

M. Wittmann, G. Hager, and G. Wellein, Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-7, 2010.
DOI : 10.1109/IPDPSW.2010.5470813

P. Zandevakili, M. Hu, and Z. Qin, GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units, PLoS ONE, vol.28, issue.5, 2012.
DOI : 10.1371/journal.pone.0036865.t001

G. Zumbusch, Tuning a Finite Difference Computation for Parallel Vector Processors List of Figures 2.1 Oil discoveries and oil production, Parallel and Distributed Computing, International Symposium on, pp.63-70, 1930.

A. Appendix, @. H. List-of-publications, R. Calandra, P. Dolbeau, J. Fortin et al., Assessing the relevance of APU for high performance scientific computing, AMD Fusion Developer Summit (AFDS), 2012.

@. H. Calandra, R. Dolbeau, P. Fortin, J. Lamotte, and I. Said, Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2013.
DOI : 10.1109/PDP.2013.65

URL : https://hal.archives-ouvertes.fr/hal-01216513

@. H. Calandra, R. Dolbeau, P. Fortin, J. Lamotte, and I. Said, Forward seismic modeling on AMD Accelerated Processing Unit

@. P. Eberhart, I. Said, P. Fortin, and H. Calandra, Hybrid strategy for stencil computations on the APU, The 1st International Workshop on High-Performance Stencil Computations, 2014.

@. F. Jézéquel, J. Lamotte, and I. Said, Estimation of numerical reproducibility on CPU and GPU, 2015.

@. I. Said, P. Fortin, J. Lamotte, and H. Calandra, Leveraging the Accelerated Processing Units for seismic imaging: a performance and power efficiency comparison against CPUs and GPUs

@. I. Said, P. Fortin, J. Lamotte, and H. Calandra, Efficient Reverse Time Migration on APU clusters, Rice Oil & Gas HPC Workshop, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01306648