Topology-Aware Mappings for Large-Scale Eigenvalue Problems, Euro-Par 2012 Parallel Processing -18th International Conference. T. 7484. Lecture Notes in Computer Science, pp.830-842 ,
The Gemini System Interconnect, 2010 18th IEEE Symposium on High Performance Interconnects, pp.83-87, 2010. ,
DOI : 10.1109/HOTI.2010.23
An Efficient Heuristic Procedure for Partitioning Graphs, Bell System Technical Journal 49, pp.291-307, 1970. ,
DOI : 10.1002/j.1538-7305.1970.tb01770.x
NAS Parallel Benchmark Results, p.42, 1994. ,
DOI : 10.1007/978-94-011-5412-3_14
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.17
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems, Computer Science - Research and Development, vol.49, issue.2???3, pp.3-4, 2011. ,
DOI : 10.1007/s00450-011-0168-y
Topology Aware Task Mapping In : Encyclopedia of Parallel Computing (to appear) Sous la dir. de D. Padua, p.61, 2011. ,
Benefits of Topology Aware Mapping for Mesh Interconnects, Parallel Processing Letters (Special issue on Large-Scale Parallel Processing), pp.549-566, 2008. ,
DOI : 10.1142/S0129626408003569
Cilk : an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming. PPOPP '95, pp.207-216, 1995. ,
Rank reordering for MPI communication optimization, Computer & Fluids (jan. 2012) (cf, pp.23-60 ,
DOI : 10.1016/j.compfluid.2012.01.019
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.2010-2037, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Handling Application-Induced Load Imbalance using Parallel Objects, Parallel and Distributed Computing for Symbolic and Irregular Applications, pp.167-181, 2000. ,
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014. ,
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319
MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006. ,
DOI : 10.1145/1183401.1183451
Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969. ,
DOI : 10.1145/800195.805928
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, Selected Papers from EuroPVM, pp.634-644, 2006. ,
DOI : 10.1016/j.parco.2007.06.003
URL : https://hal.archives-ouvertes.fr/hal-00344327
A Profile Based Approach for Topology Aware MPI Rank Placement ,
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.27-36, 2014. ,
DOI : 10.1109/IPDPS.2014.15
Zoltan data management services for parallel dynamic applications, Computing in Science & Engineering, vol.4, issue.2, pp.90-97, 2002. ,
DOI : 10.1109/5992.988653
Method and System for Optimizing Communication in MPI Programs for an Execution Environment, p.24, 2008. ,
Scotch and LibScotch 5.1 User's Guide. http://www.labri. fr, ScAlApplix project, p.66, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00410332
Static mapping by dual recursive bipartitioning of process architecture graphs, Proceedings of IEEE Scalable High Performance Computing Conference, pp.486-493, 1994. ,
DOI : 10.1109/SHPCC.1994.296682
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings of the 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004. ,
DOI : 10.1007/978-3-540-30218-6_19
Computers and Intractability ; A Guide to the Theory of NP-Completeness, p.14, 1990. ,
Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, pp.2014-2042, 2014. ,
DOI : 10.1109/ICPPW.2014.38
URL : https://hal.archives-ouvertes.fr/hal-01010599
Rank reordering strategy for MPI topology creation functions, pp.188-195, 1998. ,
DOI : 10.1007/BFb0056575
The Chaco User's Guide : Version 2.0. Rapp. tech, SAND94?2692. Sandia National Laboratory, pp.23-44, 1994. ,
An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations, SIAM Journal on Scientific Computing, vol.16, issue.2, pp.452-469, 1995. ,
DOI : 10.1137/0916028
Generic topology mapping strategies for large-scale parallel architectures, Proceedings of the international conference on Supercomputing, ICS '11, pp.75-84, 2011. ,
DOI : 10.1145/1995896.1995909
The scalable process topology interface of MPI 2.2, Concurrency and Computation : Practice and Experience, pp.293-310, 2010. ,
DOI : 10.1002/cpe.1643
The scalable process topology interface of MPI 2.2, Concurrency and Computation : Practice and Experience 23, pp.293-310, 2011. ,
DOI : 10.1002/cpe.1643
Adaptive MPI, Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), LNCS 2958. College Station, pp.306-322, 2003. ,
DOI : 10.1007/978-3-540-24644-2_20
Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, pp.527-531, 2011. ,
DOI : 10.1109/CLUSTER.2011.59
Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments, Computer & Fluids (avr. 2012) (cf, p.22 ,
DOI : 10.1016/j.compfluid.2012.04.024
Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS???MP, The Astrophysical Journal Supplement Series, vol.165, issue.1, pp.188-228, 2006. ,
DOI : 10.1086/504594
Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters, 2008 37th International Conference on Parallel Processing, pp.141-148, 2008. ,
DOI : 10.1109/ICPP.2008.42
Implementing the MPI Process Topology Mechanism, ACM/IEEE SC 2002 Conference (SC'02), pp.1-14, 2002. ,
DOI : 10.1109/SC.2002.10045
Cray MPT : MPI on the Cray XT ,
Massively parallel cosmological simulations with ChaNGa, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.62-90, 2008. ,
DOI : 10.1109/IPDPS.2008.4536319
Approximation Algorithms for the Weighted Independent Set Problem, LNCS, vol.3787, pp.341-350, 2005. ,
DOI : 10.1007/11604686_30
Charm++ : Parallel Programming with Message- Driven Objects " . In : Parallel Programming using C++. Sous la dir, pp.175-213, 1996. ,
Programming Models at Exascale : Adaptive Runtime Systems , Incomplete Simple Languages, and Interoperability, In : The International Journal of High Performance Computing Applications, vol.23, issue.4, pp.344-346, 2009. ,
CHARM++ : A Portable Concurrent Object Oriented System Based on C++, Proceedings of Object-Oriented Programming, Systems, Languages and Applications (OOPSLA) 93, pp.91-108, 1993. ,
Charm++ and AMPI : Adaptive Runtime Strategies via Migratable Objects Advanced Computational Infrastructures for Parallel and Distributed Applications, pp.265-282, 2009. ,
Programming Petascale Applications with Charm++ and AMPI " . In : Petascale Computing : Algorithms and Applications, pp.421-441, 2008. ,
Charm++ for Productivity and Performance : A Submission to the 2011 HPC Class II Challenge. Rapp. tech. 11-49, p.61 ,
Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance A Submission to 2012 HPC Class II Challenge. Rapp. tech. 12-47, pp.2012-61 ,
METIS -Unstructured Graph Partitioning and Sparse Matrix Ordering System, pp.44-66 ,
Multilevel Algorithms for Multi-Constraint Graph Partitioning, Proceedings of the IEEE/ACM SC98 Conference, pp.1-13, 1998. ,
DOI : 10.1109/SC.1998.10018
Multilevelk-way Partitioning Scheme for Irregular Graphs, Journal of Parallel and Distributed Computing, vol.48, issue.1, pp.96-129, 1998. ,
DOI : 10.1006/jpdc.1997.1404
Aufgabe 300, Jahresber. Deutsch. Math. -Verein, vol.58, p.36, 1955. ,
Process Distance-Aware Adaptive MPI Collective Communications, 2011 IEEE International Conference on Cluster Computing, pp.196-204, 2011. ,
DOI : 10.1109/CLUSTER.2011.30
LeanMD : A Charm++ framework for high performance molecular dynamics simulation on large parallel machines " . Mém.de mast, pp.62-79, 2004. ,
Deploying a Large Petascale System : The Blue Waters Experience, 2014 International Conference on Computational Science, pp.198-209, 2014. ,
A distributed dynamic load balancer for iterative applications, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-15, 2013. ,
DOI : 10.1145/2503210.2503284
Thermal aware automated load balancing for HPC applications, 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-8, 2013. ,
DOI : 10.1109/CLUSTER.2013.6702627
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, In : EuroPVM/MPI. T. Lecture Notes in Computer Science. Espoo, vol.5759, issue.26, pp.104-115, 2009. ,
DOI : 10.1007/978-3-642-03770-2_17
URL : https://hal.archives-ouvertes.fr/inria-00392581
Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, EuroMPI. T. Lecture Notes in Computer Science. Santorini, vol.6960, pp.39-49, 2011. ,
DOI : 10.1007/978-3-642-24449-0_7
URL : https://hal.archives-ouvertes.fr/hal-00643151
An O( (V )E) algorithm for finding a maximum matching in general graphs, Proc. 21st Ann IEEE Symp. Foundations of Computer Science, pp.17-27, 1980. ,
NAMD: a Parallel, Object-Oriented Molecular Dynamics Program, International Journal of High Performance Computing Applications, vol.10, issue.4, pp.251-268, 1996. ,
DOI : 10.1177/109434209601000401
Experimental Analysis of the Dual Recursive Bipartitioning Algorithm for Static Mapping. Rapp. tech, p.24, 1996. ,
Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems, Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE. 2012, pp.236-243 ,
A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems, Parallel Processing (ICPP), 2012 41st International Conference on. IEEE. 2012, pp.118-127 ,
A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), pp.2014-2019, 2014. ,
DOI : 10.1109/HPCSim.2014.6903790
URL : https://hal.archives-ouvertes.fr/hal-00985598
Multi-core and Network Aware MPI Topology Functions, EuroMPI. 2011, pp.50-60 ,
DOI : 10.1007/978-3-642-24449-0_8
A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model, 2010 22nd International Symposium on Computer Architecture and High Performance Computing, pp.2010-61 ,
DOI : 10.1109/SBAC-PAD.2010.18
Multicore Aware Process Mapping and its Impact on Communication Overhead of Parallel Applications, Proceedings of the IEEE Symp. on Comp. and Comm. Juil, pp.811-817, 2009. ,
Issues in the study of graph embeddings In : Graphtheoretic Concepts in Computer Science ,
Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper), Proceedings from the 6th International Euro-Par Conference on Parallel Processing ,
Performance Effects of Node Mappings on the IBM BlueGene/L Machine, Euro-Par, pp.1005-1013, 2005. ,
DOI : 10.1007/11549468_110
NetPIPE : A Network Protocol Independent Performance Evaluator, IASTED International Conference on Intelligent Information Management and Systems, p.55, 1996. ,
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.12-22, 2012. ,
DOI : 10.1109/SC.2012.47
POWER4 system microarchitecture, IBM Journal of Research and Development, vol.46, issue.1, pp.5-25, 2002. ,
DOI : 10.1147/rd.461.0005
EZTrace : a generic framework for performance analysis Poster Session, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.7-32, 2011. ,
UPC Language Specifications, v1, p.25, 2005. ,
Optimized process placement for collective I/O operations, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, p.24 ,
DOI : 10.1145/2488551.2488567
Blue Gene System Software -Topology Mapping for, pp.116-138, 2006. ,
Topology Mapping for Blue Gene/L Supercomputer, ACM/IEEE SC 2006 Conference (SC'06), pp.116-60, 2006. ,
DOI : 10.1109/SC.2006.63
FACT, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.32, 2009. ,
DOI : 10.1145/1654059.1654087
Process Mapping for MPI Collective Communications, pp.81-92, 2009. ,
DOI : 10.1109/71.642949
Achieving high performance on extremely large parallel machines : performance prediction and load balancing, Thèse de doct, p.61, 2005. ,
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, 2010 39th International Conference on Parallel Processing Workshops, pp.61-72, 2010. ,
DOI : 10.1109/ICPPW.2010.65
Periodic hierarchical load balancing for large supercomputers, IJHPCA) (mar. 2011) (cf, pp.61-72 ,
DOI : 10.1177/1094342010394383
Hierarchical Collectives in MPICH2, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.325-326, 2009. ,
DOI : 10.1007/978-3-642-03770-2_41
Process Placement in Multicore Clusters : Algorithmic Issues and Practical Techniques, Anglais. In : IEEE Transactions on Parallel and Distributed Systems, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00803548
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch, pp.2013-60 ,
TreeMatch : Un algorithme de placement de processus sur architectures multicoeurs, Français. In : RenPAR -21e Rencontres Francophones du Parallélisme, 2013. ,
Communication-aware load balancing with TreeMatch in Charm++, Presented at the 9th workshop of the Joint Laboratory for Petascale Computing, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00851148
Distributed communication-aware load balancing with Tree- Match in Charm++, Presented at the 11th workshop of the Joint Laboratory for Petascale Computing, 2014. ,
Distributed communication-aware load balancing with Tree- Match in Charm++, Presented at the 9th Scheduling for Large Scale Systems Workshop, 2014. ,
Load balacing and affinities between processes with TreeMatch in Charm++ : preliminary results and prospects, Presented at the 7th workshop of the Joint Laboratory for Petascale Computing, 2012. ,
Processes placement on multicore Dynamic load balancing in Charm++, Presented at the 10th Annual Charm++ Workshop, 2012. ,