.. Notre-compilateur-dérivé-d-'ompi, 67 III.2 Spécificités du langage CUDA et intégration à l'analyseur syntaxique, p.68

.. Amélioration-du-code-transformé, 89 III.5.1 Les leviers d'optimisations

G. Avec-openmp-ou and M. , 97 IV.2.1 Description du problème cible, Sommaire IV.1 La problématique : gérer un noeud de calcul multiGPU . 96 IV.2 Contrôle des 99 IV.2.4 Les deux implémentations . . . . . . . . . . . . . . . . . . . . 99

?. Mpi-+-malloc, GPU -2 processus, : accélération de 1,51 ? OpenMP+cudaHostAlloc, 4 GPU -4 threads

R. Ayguadé, R. M. Badia, P. Bellens, D. Cabrera, A. Duran et al., Extending OpenMP to Survive the Heterogeneous Multi-Core Era, International Journal of Parallel Programming, vol.41, issue.1, pp.440-459, 2010.
DOI : 10.1007/s10766-010-0135-4

C. Ancourt, F. Coelho, B. Creusillet, F. Irigoin, P. Jouvelot et al., PIPS : a Workbench for Program Parallelization and Optimization, European Parallel Tool Meeting (EPTM), 1996.

[. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model Is More Widely Applicable Than You Think, Compiler Construction, pp.283-303, 2010.
DOI : 10.1007/978-3-642-11970-5_16

URL : https://hal.archives-ouvertes.fr/inria-00551087

]. J. Bshds11, E. Beyer, A. Stotzer, B. Hart, and . De-supinski, OpenMP for accelerators, OpenMP in the Petascale Era, pp.108-121, 2011.

J. [. Bal, A. S. Steiner, and . Tanenbaum, Programming languages for distributed computing systems, ACM Computing Surveys, vol.21, issue.3, pp.261-322, 1989.
DOI : 10.1145/72551.72552

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.7873

[. Chapman, G. Jost, R. Van-der-pas, and D. J. Kuck, Using OpenMP : Portable Shared Memory Parallel Programming, 2007.

. Clearspeed, CSX700 Floating Point Processor Datasheet, 2008.

[. Carribault, M. Pérache, and H. Jourdren, Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC, Mitsuhisa Sato
DOI : 10.1007/978-3-642-13217-9_1

B. Ller, . Chapman, and . Bronis-de-supinski, Beyond Loop Level Parallelism in OpenMP : Accelerators , Tasking and More, Lecture Notes in Computer Science, vol.6132, pp.1-14, 2012.

S. [. Dolbeau, F. Bihan, and . Bodin, HMPP : A Hybrid Multi-core Parallel Programming Environment, Proceedings of GPGPU, First Workshop on General Purpose Processing on Graphics Processing Units, 2007.

P. [. Dongarra, A. Luszczek, and . Petitet, LINPACK Benchmark, pp.803-820, 2003.
DOI : 10.1007/978-0-387-09766-4_155

[. Dimakopoulos, E. Leontiadis, and G. Tzoumas, A portable C compiler for OpenMP V.2.0, EWOMP 2003, 2003.

]. J. Don79 and . Dongarra, LINPACK : Users' Guide. Number 8, Society for Industrial Mathematics, 1979.

]. R. Dun90 and . Duncan, A survey of parallel computer architectures, Computer, vol.23, issue.2, pp.5-16, 1990.

K. [. Feng and . Cameron, The Green500 List: Encouraging Sustainable Supercomputing, Computer, vol.40, issue.12, pp.50-55, 2007.
DOI : 10.1109/MC.2007.445

]. M. Fly66 and . Flynn, Very high-speed computing systems, Proceedings of the IEEE, vol.54, issue.12, pp.1901-1909, 1966.

]. M. Fly72 and . Flynn, Some computer organizations and their effectiveness. Computers, IEEE Transactions, issue.219, pp.948-960, 1972.

J. Michael, K. W. Flynn, and . Rudd, Parallel architectures, ACM Comput. Surv, vol.28, issue.1, pp.67-70, 1996.

T. [. Feng and . Scogland, The Green500 List: Year one, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-7, 2009.
DOI : 10.1109/IPDPS.2009.5160978

]. K. Gre11 and . Gregory, Overview and C++ AMP approach, 2011.

]. E. Joh88 and . Johnson, Completing an MIMD multiprocessor taxonomy, ACM SIGARCH Computer Architecture News, vol.16, issue.3, pp.44-47, 1988.

M. [. Karlsson and . Brorsson, A free OpenMP compiler and run-time library infrastructure for research on shared memory parallel computing, Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems, pp.354-361, 2004.

. A. Kdh-+-05-]-j, M. N. Kahle, H. P. Day, C. R. Hofstee, T. R. Johns et al., Introduction to the Cell multiprocessor, IBM journal of Research and Development, vol.49, issue.45, pp.589-604, 2005.

B. David, W. W. Kirk, and . Hwu, Programming Massively Parallel Processors -A Hands-on Approach Chris Lattner and Vikram Adve. LLVM : A Compilation Framework for Lifelong Program Analysis & Transformation, Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO´04CGO´ CGO´04), 2004.

[. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU : a compiler framework for automatic translation and optimization, PPoPP '09 : Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.101-110, 2009.

[. Liao, D. Quinlan, T. Panas, and . Bronis-de-supinski, A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries, Mitsuhisa Sato
DOI : 10.1007/978-3-642-13217-9_2

B. Ller and . Chapman, Beyond Loop Level Parallelism in OpenMP : Accelerators, Tasking and More, Lecture Notes in Computer Science, vol.6132, pp.15-28, 2010.

M. [. Lengyel, B. R. Reichert, D. P. Donald, and . Greenberg, Real-time robot motion planning using rasterizing computer graphics hardware, 1990.

[. Malladi, R. Dodson, and V. Kitaeff, Intel?? many integrated core (MIC) architecture, Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date, Astro-HPC '12, pp.5-6, 2005.
DOI : 10.1145/2286976.2286979

G. Moore, Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, 1965.
DOI : 10.1109/JPROC.1998.658762

G. E. Moore, Progress in digital integrated electronics, Electron Devices Meeting, pp.11-13, 1975.

K. [. Murphy, B. W. Wheeler, J. Barrett, and . Ang, Introducing the Graph 500, 2010.

J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

G. Noaje, C. Jaillet, and M. Krajecki, Source-to-Source Code Translator: OpenMP C to CUDA, 2011 IEEE International Conference on High Performance Computing and Communications, pp.512-519, 2011.
DOI : 10.1109/HPCC.2011.73

M. [. Noaje, C. Krajecki, and . Jaillet, MultiGPU computing using MPI or OpenMP, Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing, pp.347-354, 2010.
DOI : 10.1109/ICCP.2010.5606414

. Nvia and . Nvidia-corporation, CUBLAS Library Version 5.0. NVIDIA Corporation

. Nvib and . Nvidia-corporation, CUDA Best Practices Guide Version 5.0. NVIDIA Corporation

. Nvic and . Nvidia-corporation, CUDA Programming Guide Version 5.0. NVIDIA Corporation

. Nvid and . Nvidia-corporation, Fermi Architecture Whitepaper. NVIDIA Corpo- ration

L. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam et al., Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.14

URL : https://hal.archives-ouvertes.fr/inria-00551067

[. Group, PGI Fortran & C Accelerator Programming Model, 2008.

J. Michael and . Quinn, Parallel Programming in C with MPI and OpenMP, 2003.

[. Raoult, HMPP Workbench -Build manycore applications, GPU Accelerators and Hybrid Computing Workshop, 2009.

L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash et al., Larrabee : a many-core x86 architecture for visual computing

H. [. Sato, Y. Harada, and . Ishikawa, Openmp compiler for a software distributed shared memory system scash, 2000.

]. H. Sør12 and . Sørensen, Auto-tuning of level 1 and level 2 BLAS for GPUs, Concurrency and Computation : Practice and Experience, 2012.

D. [. Skillicorn and . Talia, Models and languages for parallel computation, ACM Computing Surveys, vol.30, issue.2, pp.123-169, 1998.
DOI : 10.1145/280277.280278

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.1801

]. V. Vol10 and . Volkov, Better performance at lower occupancy, Proceedings of the GPU Technology Conference, 2010.

M. Wolfe, Compilers and More : A GPU and Accelerator Programming Model, 2008.

R. Tevanian, D. Rashid, J. Golub, and . Eppinger, The duality of memory and communication in the implementation of a multiprocessor operating system, SIGOPS Oper. Syst. Rev, vol.21, issue.5, pp.63-76, 1987.