Implementing MPI on the BlueGene/L Supercomputer, Euro-Par 2004 Parallel Processing, pp.833-845, 2004. ,
DOI : 10.1007/978-3-540-27866-5_112
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, p.31, 2010. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Hatem Ltaief, and Stanimire Tomov. LU factorization for accelerator-based systems, 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), p.31, 2011. ,
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, p.31, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Early evaluation of IBM BlueGene/P, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-2312, 2008. ,
DOI : 10.1109/SC.2008.5214725
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
The Design of OpenMP Tasks, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.3, pp.404-418, 2009. ,
DOI : 10.1109/TPDS.2008.105
A case for now (networks of workstations). Micro, IEEE, vol.15, issue.1, pp.54-64, 1995. ,
Shared memory consistency models: a tutorial, Computer, vol.29, issue.12, pp.66-76, 1996. ,
DOI : 10.1109/2.546611
Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the International Conference on High Performance Computing (HiPC), p.29, 2006. ,
DOI : 10.1007/11945918_35
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.128.7975
Advanced Micro Devices HyperTransport Technology I/O Link, A High-Bandwidth I/O Architecture, p.29, 2001. ,
Packaging technology for the NEC SX-3/SX-X Supercomputer, 40th Conference Proceedings on Electronic Components and Technology, pp.525-533, 1990. ,
DOI : 10.1109/ECTC.1990.122238
StarPU : A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation : Practice and Experience, Special Issue : Euro-Par, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
The Nas Parallel Benchmarks, PVM/MPI, pp.63-73, 1991. ,
DOI : 10.1177/109434209100500306
Fine-grained multithreading support for hybrid threaded mpi programming, IJHPCA, vol.24, pp.49-57, 2010. ,
Myrinet: a gigabit-per-second local area network, Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), pp.29-36, 1995. ,
DOI : 10.1109/40.342015
Exploiting data similarity to reduce memory footprints, IPDPS, pp.152-163, 2011. ,
A Microbenchmark Suite for Mixed-Mode OpenMP/MPI, IWOMP'09, pp.118-131, 2009. ,
DOI : 10.1155/2001/450503
A multithreaded PowerPC processor for commercial servers, IBM Journal of Research and Development, vol.44, issue.6, pp.885-898, 2000. ,
DOI : 10.1147/rd.446.0885
Java threads -a white paper, p.37, 1996. ,
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, vol.33, issue.9, pp.634-644, 2007. ,
DOI : 10.1016/j.parco.2007.06.003
URL : https://hal.archives-ouvertes.fr/hal-00344327
Exploiting Locality on the Cell/B.E. through Bypassing, Proceedings of the 9th International Workshop on Embedded Computer Systems : Architectures, Modeling, and Simulation, SAMOS '09, pp.318-328, 2009. ,
DOI : 10.1147/rd.515.0593
De l'exécution d'applications scientifiques OpenMP sur architectures hiérarchiques, p.71, 2010. ,
Grid'5000: a large scale and highly reconfigurable grid experimental testbed, The 6th IEEE/ACM International Workshop on Grid Computing, 2005., pp.99-106, 2005. ,
DOI : 10.1109/GRID.2005.1542730
URL : https://hal.archives-ouvertes.fr/hal-00684943
Introduction to upc and language specification, for Computing Sciences, p.46, 1999. ,
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor, IEEE Micro, vol.30, issue.2, pp.16-29, 2010. ,
DOI : 10.1109/MM.2010.31
An Introduction to the Intel QuickPath Interconnect, pp.29-30, 2009. ,
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing, 2011 Symposium on Application Accelerators in High-Performance Computing, p.32, 2011. ,
DOI : 10.1109/SAAHPC.2011.29
HMPP : A hybrid multi-core parallel programming environment, pp.1-5, 2007. ,
iWarp protocol kernel space software implementation, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pp.274-274, 2006. ,
DOI : 10.1109/IPDPS.2006.1639565
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.74.3563
A threads-only mpi implementation for the development of parallel programs The native posix thread library for linux, In : Proceedings of the 11th International Symposium on High Performance Computing Systems, pp.153-163, 1997. ,
An Efficient Multi-level Trace Toolkit for Multi-threaded Applications, EuroPar, p.81, 2005. ,
DOI : 10.1007/11549468_21
URL : https://hal.archives-ouvertes.fr/hal-00360309
Dynamic load balancing of unbalanced computations using message passing Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs, IEEE International Parallel and Distributed Processing Symposium Proceedings of SHPCC'94, pp.1-8, 1994. ,
Scaling mpi to shortmemory mpps such as bg/l, Proceedings of the 20th annual international conference on Supercomputing, ICS '06, pp.209-218, 2006. ,
The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pp.212-223, 1998. ,
DOI : 10.1145/277652.277725
Scotch and LibScotch 5.1 User's Guide. ScAlApplix project, INRIA Bordeaux ? Sud-Ouest, ENSEIRB & LaBRI, UMR CNRS 5800, Ful99] S. Fuller. Motorola's altivec technology. Networking & Computing Core Technology, pp.72-82, 1999. ,
URL : https://hal.archives-ouvertes.fr/hal-00410332
MiMPI: A Multithread-Safe Implementation of MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.674-674, 1999. ,
DOI : 10.1007/3-540-48158-3_26
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.353-377, 2004. ,
DOI : 10.1007/978-3-540-30218-6_19
Open MPI : Goals, concept, and design of a next generation MPI implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004. ,
Scalable Memory Use in MPI: A Case Study with MPICH2, Recent Advances in the Message Passing Interface, pp.140-149, 2011. ,
DOI : 10.1007/978-3-642-24449-0_17
High-performance message-passing over generic Ethernet hardware with Open-MX, Parallel Computing, vol.37, issue.2, pp.85-100, 2011. ,
DOI : 10.1016/j.parco.2010.11.001
URL : https://hal.archives-ouvertes.fr/inria-00533058
Mpich2 : A new start for mpi implementations Design of the tera mta integrated circuits, Recent Advances in Parallel Virtual Machine and Message Passing Interface Gallium Arsenide Integrated Circuit (GaAs IC) Symposium 19th Annual, pp.31-45, 1997. ,
Hardware system of the Earth Simulator, Parallel Computing, vol.30, issue.12, pp.1287-1313, 2004. ,
DOI : 10.1016/j.parco.2004.09.004
Next generation posix threading, p.37 ,
http://software.intel.com/ en-us/articles/intel-cilk-plus-specification Implementation and performance evaluation of the hpc challenge benchmarks in coarray fortran 2.0, Int11] Intel. Intel cilk plus specification Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, pp.40-1089, 2011. ,
The OpenCL Specification, version 1.1, 6 Hyperthreading technology in the netburst microarchitecture, pp.3156-65, 2003. ,
Speculative defragmentation leading gigabit ethernet to true zero-copy communication, Cluster Computing, vol.4, issue.1, pp.7-18, 2001. ,
DOI : 10.1023/A:1011456024871
The linux threads library, p.36, 1999. ,
The sgi origin : a ccnuma highly scalable server, Proceedings of the 24th annual international symposium on Computer architecture, ISCA '97, pp.241-251, 1997. ,
The sgi origin : a ccnuma highly scalable server, Proceedings of the 24th annual international symposium on Computer architecture, ISCA '97, pp.241-251, 1997. ,
Condor-a hunter of idle workstations, [1988] Proceedings. The 8th International Conference on Distributed, p.11, 1988. ,
DOI : 10.1109/DCS.1988.12507
High performance Fortran, IEEE Parallel & Distributed Technology: Systems & Applications, vol.1, issue.1, pp.25-42, 1993. ,
DOI : 10.1109/88.219857
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, In EuroPVM/MPI Lecture Notes in Computer Science, vol.5759, issue.103, pp.104-115, 2009. ,
DOI : 10.1007/978-3-642-03770-2_17
URL : https://hal.archives-ouvertes.fr/inria-00392581
Progress in digital integrated electronics, Electron Devices Meeting, pp.11-13, 1975. ,
ff. Solid-State Circuits Newsletter [MPIa] The message passing interface (mpi) standard. http://www.mcs.anl.gov, MPIb] Message passing interface (mpi) forum, pp.11433-11468, 1965. ,
Itanium 2 processor microarchitecture, IEEE Micro, vol.23, issue.2, pp.44-55, 2003. ,
DOI : 10.1109/MM.2003.1196114
PM2 : un environnement pour une conception portable et une exécution efficace des applications parallèles irrégulières, p.37, 1997. ,
Co-array fortran for parallel programming, SIGPLAN Fortran Forum, vol.17, pp.1-31, 1998. ,
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.8, pp.696-710, 2009. ,
DOI : 10.1016/j.jpdc.2008.09.001
IEEE Standards Office A survey of general-purpose computation on graphics hardware, Science : IEEE Std. Computer Graphics Forum, vol.26, issue.1, pp.1596-1992, 1993. ,
Arsenic: a user-accessible gigabit Ethernet interface, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213), pp.67-76, 2001. ,
DOI : 10.1109/INFCOM.2001.916688
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.4189
The Quadrics network : high-performance clustering technology Openmpspy : Leveraging quality assurance for parallel software, Proceedings of the 17th international conference on Parallel processing -Volume Part II, Euro-Par'11, pp.124-135, 2011. ,
Available from http://www. c3.lanl.gov/PAL/publications/papers/Pakin1995:FM.pdfPLP] Portable Linux Processor Affinity. http://www.open-mpi.org/projects/ plpa Bip : a new protocol designed for high performance networking on myrinet Mmx technology extension to the intel architecture Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms, High performance messaging on workstations : Illinois Fast Messages (FM) for Myrinet Proceedings of the 1995 ACM Workshop PC-NOW, IPPS/SPDP98 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1528-1557, 1995. ,
Intel threading building blocks -outfitting C++ for multicore processor parallelism. O'Reilly Réseau national de télécommunications pour la technologie, l'enseignement et la recherche, pp.40-54, 2007. ,
Work Stealing for Multi-core HPC Clusters, Euro-Par 2011 Parallel Processing, pp.205-217, 2011. ,
DOI : 10.1145/568014.379563
Implementing streaming SIMD extensions on the Pentium III processor, IEEE Micro, vol.20, issue.4, pp.47-57, 2000. ,
DOI : 10.1109/40.865866
Implementing streaming simd extensions on the pentium iii processor. Micro, IEEE, vol.20, issue.4, pp.47-57, 2000. ,
The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
DOI : 10.1145/359327.359336
The PVM concurrent computing system: Evolution, experiences, and trends, Parallel Computing, vol.20, issue.4, pp.531-546, 1994. ,
DOI : 10.1016/0167-8191(94)90027-2
Message-passing and shared-data programming models -wish vs. reality. High Performance Computing Systems and Applications BEOWULF : A parallel workstation for scientific computation, Annual International Symposium on Proceedings of the 24th International Conference on Parallel Processing, pp.65-131, 1995. ,
EMP, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '01, pp.49-66, 2001. ,
DOI : 10.1145/582034.582091
Simultaneous multithreading : Maximizing on-chip parallelism, Computer Architecture, 1995. Proceedings ., 22nd Annual International Symposium on, pp.392-403, 1995. ,
Test suite for evaluating performance of multithreaded MPI communication, Parallel Computing, vol.35, issue.12, pp.608-617, 2009. ,
DOI : 10.1016/j.parco.2008.12.013
The CDC 6600 Project, IEEE Annals of the History of Computing, vol.2, issue.4, pp.338-348, 1980. ,
DOI : 10.1109/MAHC.1980.10044
Facom vp-100 Supercomputers with ease of use, Top500 Supercomputing Sites, pp.87-107, 1985. ,
EZTrace : a generic framework for performance analysis Poster Session Optimizing threaded mpi execution on smp clusters, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) IN PROC. OF 15TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, pp.81-381, 2001. ,
Fujitsu vp2000 series, Compcon Spring '90. Intellectual Leverage. Digest of Papers. Thirty-Fifth IEEE Computer Society International Conference, pp.4-11, 1990. ,
DOI : 10.1109/cmpcon.1990.63645
Benchmarking gpus to tune dense linear algebra Active Messages : a Mechanism for Integrated Communication and Computation, Proceedings of the 2008 ACM Proceedings of the 19th Int'l Symp. on Computer Architecture, pp.46-77, 1992. ,
Packaging technology for the nec sx supercomputer . Components, Hybrids, and Manufacturing TechnologyXCA] Xcalablemp : Directive-based language extension for scalable and performance-aware parallel programming, IEEE Transactions on, vol.8, issue.4, pp.462-467, 1985. ,
Impact du placement de 2 processus à 4 threads sur la machine, p.104 ,
Impact du placement de 2 processus à 4 threads sur la machine, p.105 ,
Impact du placement de 2 processus à 4 threads sur la machine, p.106 ,
Impact du placement de 2 processus à 4 threads sur la machine, p.107 ,
Impact du placement de 2 processus à 4 threads sur la grappe de calcul ,