A. Hasan-metin, Topology-Aware Mappings for Large-Scale Eigenvalue Problems, Euro-Par 2012 Parallel Processing -18th International Conference. T. 7484. Lecture Notes in Computer Science, pp.830-842

R. Alverson, D. Roweth, and L. Kaplan, The Gemini System Interconnect, 2010 18th IEEE Symposium on High Performance Interconnects, pp.83-87, 2010.
DOI : 10.1109/HOTI.2010.23

B. W. Kernighan and S. Lin, An Efficient Heuristic Procedure for Partitioning Graphs, Bell System Technical Journal 49, pp.291-307, 1970.
DOI : 10.1002/j.1538-7305.1970.tb01770.x

D. H. Bailey, NAS Parallel Benchmark Results, p.42, 1994.
DOI : 10.1007/978-94-011-5412-3_14

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.17

P. Balaji, Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems, Computer Science - Research and Development, vol.49, issue.2???3, pp.3-4, 2011.
DOI : 10.1007/s00450-011-0168-y

A. Bhatele, Topology Aware Task Mapping In : Encyclopedia of Parallel Computing (to appear) Sous la dir. de D. Padua, p.61, 2011.

A. Bhatelé, V. Laxmikant, and . Kalé, Benefits of Topology Aware Mapping for Mesh Interconnects, Parallel Processing Letters (Special issue on Large-Scale Parallel Processing), pp.549-566, 2008.
DOI : 10.1142/S0129626408003569

R. D. Blumofe, Cilk : an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming. PPOPP '95, pp.207-216, 1995.

B. Brandfass, T. Alrutz, and T. Gerhold, Rank reordering for MPI communication optimization, Computer & Fluids (jan. 2012) (cf, pp.23-60
DOI : 10.1016/j.compfluid.2012.01.019

F. Broquedis, hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.2010-2037, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

K. Robert, . Brunner, V. Laxmikant, and . Kalé, Handling Application-Induced Load Imbalance using Parallel Objects, Parallel and Distributed Computing for Symbolic and Irregular Applications, pp.167-181, 2000.

H. Casanova, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014.
DOI : 10.1016/j.jpdc.2014.06.008

URL : https://hal.archives-ouvertes.fr/hal-01017319

H. Chen, MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006.
DOI : 10.1145/1183401.1183451

E. Cuthill and J. Mckee, Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969.
DOI : 10.1145/800195.805928

D. Buntinas, G. Mercier, and W. Gropp, Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, Selected Papers from EuroPVM, pp.634-644, 2006.
DOI : 10.1016/j.parco.2007.06.003

URL : https://hal.archives-ouvertes.fr/hal-00344327

D. Solt, A Profile Based Approach for Topology Aware MPI Rank Placement

M. Deveci, Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.27-36, 2014.
DOI : 10.1109/IPDPS.2014.15

K. Devine, Zoltan data management services for parallel dynamic applications, Computing in Science & Engineering, vol.4, issue.2, pp.90-97, 2002.
DOI : 10.1109/5992.988653

E. Duesterwald, R. W. Wisniewski, P. F. Sweeney, G. Cascaval, and S. E. Smith, Method and System for Optimizing Communication in MPI Programs for an Execution Environment, p.24, 2008.

F. Pellegrini, Scotch and LibScotch 5.1 User's Guide. http://www.labri. fr, ScAlApplix project, p.66, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00410332

F. Pellegrini, Static mapping by dual recursive bipartitioning of process architecture graphs, Proceedings of IEEE Scalable High Performance Computing Conference, pp.486-493, 1994.
DOI : 10.1109/SHPCC.1994.296682

E. Gabriel, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings of the 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

R. Michael, D. S. Garey, and . Johnson, Computers and Intractability ; A Guide to the Theory of NP-Completeness, p.14, 1990.

B. Goglin, J. Hursey, and J. M. Squyres, Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, pp.2014-2042, 2014.
DOI : 10.1109/ICPPW.2014.38

URL : https://hal.archives-ouvertes.fr/hal-01010599

T. Hatazaki, Rank reordering strategy for MPI topology creation functions, pp.188-195, 1998.
DOI : 10.1007/BFb0056575

B. Hendrickson and R. Leland, The Chaco User's Guide : Version 2.0. Rapp. tech, SAND94?2692. Sandia National Laboratory, pp.23-44, 1994.

B. Hendrickson and R. Leland, An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations, SIAM Journal on Scientific Computing, vol.16, issue.2, pp.452-469, 1995.
DOI : 10.1137/0916028

T. Hoefler and M. Snir, Generic topology mapping strategies for large-scale parallel architectures, Proceedings of the international conference on Supercomputing, ICS '11, pp.75-84, 2011.
DOI : 10.1145/1995896.1995909

T. Hoefler, The scalable process topology interface of MPI 2.2, Concurrency and Computation : Practice and Experience, pp.293-310, 2010.
DOI : 10.1002/cpe.1643

T. Hoefler, The scalable process topology interface of MPI 2.2, Concurrency and Computation : Practice and Experience 23, pp.293-310, 2011.
DOI : 10.1002/cpe.1643

C. Huang, O. Lawlor, and L. V. Kalé, Adaptive MPI, Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), LNCS 2958. College Station, pp.306-322, 2003.
DOI : 10.1007/978-3-540-24644-2_20

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, pp.527-531, 2011.
DOI : 10.1109/CLUSTER.2011.59

S. Ito, K. Goto, and K. Ono, Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments, Computer & Fluids (avr. 2012) (cf, p.22
DOI : 10.1016/j.compfluid.2012.04.024

J. C. Hayes, M. L. Norman, R. A. Fiedler, J. O. Bordner, P. S. Li et al., Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS???MP, The Astrophysical Journal Supplement Series, vol.165, issue.1, pp.188-228, 2006.
DOI : 10.1086/504594

J. Dümmler, T. Rauber, and G. Rünger, Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters, 2008 37th International Conference on Parallel Processing, pp.141-148, 2008.
DOI : 10.1109/ICPP.2008.42

J. L. Träff, Implementing the MPI Process Topology Mechanism, ACM/IEEE SC 2002 Conference (SC'02), pp.1-14, 2002.
DOI : 10.1109/SC.2002.10045

J. L. Whitt, G. Brook, and M. Fahey, Cray MPT : MPI on the Cray XT

P. Jetley, Massively parallel cosmological simulations with ChaNGa, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.62-90, 2008.
DOI : 10.1109/IPDPS.2008.4536319

A. Kako, Approximation Algorithms for the Weighted Independent Set Problem, LNCS, vol.3787, pp.341-350, 2005.
DOI : 10.1007/11604686_30

L. V. Kale and S. Krishnan, Charm++ : Parallel Programming with Message- Driven Objects " . In : Parallel Programming using C++. Sous la dir, pp.175-213, 1996.

V. Laxmikant and . Kale, Programming Models at Exascale : Adaptive Runtime Systems , Incomplete Simple Languages, and Interoperability, In : The International Journal of High Performance Computing Applications, vol.23, issue.4, pp.344-346, 2009.

V. Laxmikant, S. Kale, and . Krishnan, CHARM++ : A Portable Concurrent Object Oriented System Based on C++, Proceedings of Object-Oriented Programming, Systems, Languages and Applications (OOPSLA) 93, pp.91-108, 1993.

V. Laxmikant, G. Kale, and . Zheng, Charm++ and AMPI : Adaptive Runtime Strategies via Migratable Objects Advanced Computational Infrastructures for Parallel and Distributed Applications, pp.265-282, 2009.

V. Laxmikant and . Kale, Programming Petascale Applications with Charm++ and AMPI " . In : Petascale Computing : Algorithms and Applications, pp.421-441, 2008.

L. Kale, Charm++ for Productivity and Performance : A Submission to the 2011 HPC Class II Challenge. Rapp. tech. 11-49, p.61

L. Kale, Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance A Submission to 2012 HPC Class II Challenge. Rapp. tech. 12-47, pp.2012-61

G. Karypis and V. Kumar, METIS -Unstructured Graph Partitioning and Sparse Matrix Ordering System, pp.44-66

G. Karypis and V. Kumar, Multilevel Algorithms for Multi-Constraint Graph Partitioning, Proceedings of the IEEE/ACM SC98 Conference, pp.1-13, 1998.
DOI : 10.1109/SC.1998.10018

G. Karypis and V. Kumar, Multilevelk-way Partitioning Scheme for Irregular Graphs, Journal of Parallel and Distributed Computing, vol.48, issue.1, pp.96-129, 1998.
DOI : 10.1006/jpdc.1997.1404

M. Kneser, Aufgabe 300, Jahresber. Deutsch. Math. -Verein, vol.58, p.36, 1955.

T. Ma, Process Distance-Aware Adaptive MPI Collective Communications, 2011 IEEE International Conference on Cluster Computing, pp.196-204, 2011.
DOI : 10.1109/CLUSTER.2011.30

V. Mehta, LeanMD : A Charm++ framework for high performance molecular dynamics simulation on large parallel machines " . Mém.de mast, pp.62-79, 2004.

L. Celso and . Mendes, Deploying a Large Petascale System : The Blue Waters Experience, 2014 International Conference on Computational Science, pp.198-209, 2014.

H. Menon and L. Kalé, A distributed dynamic load balancer for iterative applications, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-15, 2013.
DOI : 10.1145/2503210.2503284

H. Menon, Thermal aware automated load balancing for HPC applications, 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-8, 2013.
DOI : 10.1109/CLUSTER.2013.6702627

G. Mercier and J. Clet-ortega, Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, In : EuroPVM/MPI. T. Lecture Notes in Computer Science. Espoo, vol.5759, issue.26, pp.104-115, 2009.
DOI : 10.1007/978-3-642-03770-2_17

URL : https://hal.archives-ouvertes.fr/inria-00392581

G. Mercier and E. Jeannot, Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, EuroMPI. T. Lecture Notes in Computer Science. Santorini, vol.6960, pp.39-49, 2011.
DOI : 10.1007/978-3-642-24449-0_7

URL : https://hal.archives-ouvertes.fr/hal-00643151

S. Micali and V. V. Vazirani, An O( (V )E) algorithm for finding a maximum matching in general graphs, Proc. 21st Ann IEEE Symp. Foundations of Computer Science, pp.17-27, 1980.

M. Nelson, NAMD: a Parallel, Object-Oriented Molecular Dynamics Program, International Journal of High Performance Computing Applications, vol.10, issue.4, pp.251-268, 1996.
DOI : 10.1177/109434209601000401

F. Pellegrini and J. Roman, Experimental Analysis of the Dual Recursive Bipartitioning Algorithm for Static Mapping. Rapp. tech, p.24, 1996.

L. Laercio and . Pilla, Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems, Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE. 2012, pp.236-243

L. Laércio and . Pilla, A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems, Parallel Processing (ICPP), 2012 41st International Conference on. IEEE. 2012, pp.118-127

B. Putigny, B. Goglin, and D. Barthou, A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.2014-2019, 2014.
DOI : 10.1109/HPCSim.2014.6903790

URL : https://hal.archives-ouvertes.fr/hal-00985598

M. J. Rashti, Multi-core and Network Aware MPI Topology Functions, EuroMPI. 2011, pp.50-60
DOI : 10.1007/978-3-642-24449-0_8

E. R. Rodrigues, A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model, 2010 22nd International Symposium on Computer Architecture and High Performance Computing, pp.2010-61
DOI : 10.1109/SBAC-PAD.2010.18

E. Rodrigues, Multicore Aware Process Mapping and its Impact on Communication Overhead of Parallel Applications, Proceedings of the IEEE Symp. on Comp. and Comm. Juil, pp.811-817, 2009.

R. Arnoldl and . English, Issues in the study of graph embeddings In : Graphtheoretic Concepts in Computer Science

K. Schloegel, G. Karypis, and V. Kumar, Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper), Proceedings from the 6th International Euro-Par Conference on Parallel Processing

B. E. Smith and . Bode, Performance Effects of Node Mappings on the IBM BlueGene/L Machine, Euro-Par, pp.1005-1013, 2005.
DOI : 10.1007/11549468_110

Q. O. Snell, R. Armin, J. L. Mikler, and . Gustafson, NetPIPE : A Network Protocol Independent Performance Evaluator, IASTED International Conference on Intelligent Information Management and Systems, p.55, 1996.

H. Subramoni, Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.12-22, 2012.
DOI : 10.1109/SC.2012.47

J. M. Tendler, POWER4 system microarchitecture, IBM Journal of Research and Development, vol.46, issue.1, pp.5-25, 2002.
DOI : 10.1147/rd.461.0005

F. Trahay, EZTrace : a generic framework for performance analysis Poster Session, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.7-32, 2011.

U. Consortium, UPC Language Specifications, v1, p.25, 2005.

V. Venkatesan, Optimized process placement for collective I/O operations, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, p.24
DOI : 10.1145/2488551.2488567

H. Yu, I. Chung, J. E. Moreira-blue-gene, and /. Supercomputer, Blue Gene System Software -Topology Mapping for, pp.116-138, 2006.

H. Yu, I. Chung, and J. Moreira, Topology Mapping for Blue Gene/L Supercomputer, ACM/IEEE SC 2006 Conference (SC'06), pp.116-60, 2006.
DOI : 10.1109/SC.2006.63

J. Zhai, FACT, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.32, 2009.
DOI : 10.1145/1654059.1654087

J. Zhang, Process Mapping for MPI Collective Communications, pp.81-92, 2009.
DOI : 10.1109/71.642949

G. Zheng, Achieving high performance on extremely large parallel machines : performance prediction and load balancing, Thèse de doct, p.61, 2005.

G. Zheng, Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, 2010 39th International Conference on Parallel Processing Workshops, pp.61-72, 2010.
DOI : 10.1109/ICPPW.2010.65

G. Zheng, Periodic hierarchical load balancing for large supercomputers, IJHPCA) (mar. 2011) (cf, pp.61-72
DOI : 10.1177/1094342010394383

H. Zhu, Hierarchical Collectives in MPICH2, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.325-326, 2009.
DOI : 10.1007/978-3-642-03770-2_41

A. , E. Jeannot, G. Mercier, and F. Tessier, Process Placement in Multicore Clusters : Algorithmic Issues and Practical Techniques, Anglais. In : IEEE Transactions on Parallel and Distributed Systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00803548

C. Internationale-avec-comité and . Jeannot, Communication and Topology-aware Load Balancing in Charm++ with TreeMatch, pp.2013-60

C. Nationale-avec-comité, G. Jeannot, F. Mercier, and . Tessier, TreeMatch : Un algorithme de placement de processus sur architectures multicoeurs, Français. In : RenPAR -21e Rencontres Francophones du Parallélisme, 2013.

F. Tessier, Communication-aware load balancing with TreeMatch in Charm++, Presented at the 9th workshop of the Joint Laboratory for Petascale Computing, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00851148

F. Tessier, Distributed communication-aware load balancing with Tree- Match in Charm++, Presented at the 11th workshop of the Joint Laboratory for Petascale Computing, 2014.

F. Tessier, Distributed communication-aware load balancing with Tree- Match in Charm++, Presented at the 9th Scheduling for Large Scale Systems Workshop, 2014.

F. Tessier, Load balacing and affinities between processes with TreeMatch in Charm++ : preliminary results and prospects, Presented at the 7th workshop of the Joint Laboratory for Petascale Computing, 2012.

F. Tessier, Processes placement on multicore Dynamic load balancing in Charm++, Presented at the 10th Annual Charm++ Workshop, 2012.