K. Kurowski, W. Back, W. Dubitzky, L. Gulyás, G. Kampis et al., Complex System Simulations with QosCosGrid, Proceedings of the 9th International Conference on Computational Science: Part I, ICCS '09, pp.387-396, 2009.
DOI : 10.1007/978-3-642-01970-8_38

K. Kurowski, T. Piontek, P. Kopta, M. Mamo?ski, and B. Bosak, Parallel Large Scale Simulations in the PL-Grid Environment, Computational Methods in Science and Technology, vol.Special Issue, issue.1, pp.47-56, 2010.
DOI : 10.12921/cmst.2010.SI.01.47-56

U. Ramachandran, H. Venkateswaran, A. Sivasubramaniam, and A. Singla, Issues in understanding the scalability of parallel systems, Proceedings of the First International Workshop on Parallel Processing, pp.399-404, 1994.

X. Liu, J. Zhan, K. Zhan, W. Shi, L. Yuan et al., Automatic performance debugging of SPMD-style parallel programs, Journal of Parallel and Distributed Computing, vol.71, issue.7, pp.925-937, 2011.
DOI : 10.1016/j.jpdc.2011.03.006

P. Gschwandtner, T. Fahringer, and R. Prodan, Performance Analysis and Benchmarking of the Intel SCC, 2011 IEEE International Conference on Cluster Computing, pp.139-149, 2011.
DOI : 10.1109/CLUSTER.2011.24

A. Pesterev, N. Zeldovich, and R. T. Morris, Locating cache performance bottlenecks using data pro?ling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.335-348, 2010.

M. Roth, M. J. Best, C. Mustard, and A. Fedorova, Deconstructing the overhead in parallel applications, 2012 IEEE International Symposium on Workload Characterization (IISWC), pp.59-68, 2012.
DOI : 10.1109/IISWC.2012.6402901

H. Su, M. Billingsley, and A. D. George, A generalized, distributed analysis system for optimization of Parallel Applications, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-8, 2009.
DOI : 10.1109/IPDPS.2009.5160938

J. R. Hammond, S. Krishnamoorthy, S. Shende, N. A. Romero, and A. D. Malony, Performance characterization of global address space applications: a case study with nwchem. Concurrency and Computation: Practice and Experience, pp.135-154, 2012.

R. K. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, 1991.

E. A. Brewer, C. N. Dellarocas, A. Colbrook, and W. E. Weihl, Proteus: A high-performance parallel-architecture simulator, 1991.
DOI : 10.1145/149439.133146

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7487

D. Ferrari, G. Serazzi, and A. Zeigner, Measurement and tuning of computer systems, Int. CMG Conference, pp.793-794, 1984.

M. Schulz, J. A. Levine, P. Bremer, T. Gamblin, and V. Pascucci, Interpreting Performance Data across Intuitive Domains, 2011 International Conference on Parallel Processing, pp.206-215, 2011.
DOI : 10.1109/ICPP.2011.60

F. Petrini, D. J. Kerbyson, and S. Pakin, The Case of the Missing Supercomputer Performance, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, p.55, 2003.
DOI : 10.1145/1048935.1050204

M. Burtscher, B. Kim, J. Diamond, J. Mccalpin, L. Koesterke et al., PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.
DOI : 10.1109/SC.2010.41

O. Sopeju, M. Burtscher, A. Rane, and J. Browne, AutoSCOPE: Automatic Suggestions for Code Optimizations using PerfExpert, 2011 International Conference on Parallel and Distributed Processing Techniques and Applications, pp.19-25, 2011.

H. Kotakemori, H. Hasegawa, and A. Nishida, Performance evaluation of a parallel iterative method library using OpenMP, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05), p.436, 2005.
DOI : 10.1109/HPCASIA.2005.74

S. Shende, A. D. Malony, A. Morris, S. Parker, J. Davison-de-st et al., -performance evaluation of adaptive scienti?c applications using {TAU}, Parallel Computational Fluid Dynamics 2005, pp.421-428, 2006.

G. F. P?ster, Aspects of the in?niband architecture, Cluster Computing Proceedings . 2001 IEEE International Conference on, pp.369-371, 2001.

E. Lusk, N. Doss, and A. Skjellum, A high-performance, portable implementation of the mpi message passing interface standard, Parallel Computing, vol.22, pp.789-828, 1996.

R. Frost, MPICH performance characteristics and considerations, Proceedings. Second MPI Developer's Conference, pp.199-202, 1996.
DOI : 10.1109/MPIDC.1996.534115

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings , 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

T. Hoe?er, T. Mehlan, A. Lumsdaine, and W. Rehm, Netgauge: A network performance measurement framework, Proceedings of High Performance Computing and Communications , HPCC'07, pp.659-671, 2007.

P. Beckman, K. Iskra, . Yoshii, A. Coghlan, and . Nataraj, Benchmarking the effects of operating system interference on??extreme-scale parallel machines, Cluster Computing, vol.127, issue.2/3, pp.3-16, 2008.
DOI : 10.1007/s10586-007-0047-2

S. Benchmarks, https://asc.llnl.gov/sequoia/benchmarks

R. Reussner, P. Sanders, and J. Träf, SKaMPI: A Comprehensive Benchmark for Public Benchmarking of MPI, Scientific Programming, pp.55-65, 2002.
DOI : 10.1155/2002/202839

P. Luszczek, J. J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas et al., Introduction to the hpc challenge benchmark suite, 2005.

J. Dongarra and M. A. Heroux, Toward a new metric for ranking high performance computing systems, 2013.

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., The nas parallel benchmarks, The International Journal of Supercomputer Applications, 1991.

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, vol.14, issue.3, pp.189-204, 2000.
DOI : 10.1177/109434200001400303

M. Abrams, Design of a measurement instrument for distributed systems, In SIGMET- RICS, p.274, 1988.

J. Gait, A debugger for concurrent programs. Software: Practice and Experience, pp.539-554, 1985.

P. A. Emrath, S. Chosh, and D. A. Padua, Event synchronization analysis for debugging parallel programs, Proceedings of the 1989 ACM/IEEE conference on Supercomputing , Supercomputing '89, pp.580-588, 1989.
DOI : 10.1145/76263.76329

A. D. Malony, J. W. Arendt, R. A. Aydt, D. A. Reed, D. Grabas et al., An Integrated Performance Data Collection, Analysis, and Visualization System, 1989.

S. Moore, D. Cronk, S. Shende, and A. Malony, Loop-level pro?ling and analysis of dod applications using tau, HPCMP Users Group Conference, pp.378-383, 2006.

J. N. Brian, F. Wylie, B. Wolf, M. Mohr, and . Geimer, Integrated runtime measurement summarisation and selective event tracing for scalable parallel execution performance diagnosis, Proc. of the 8th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), pp.460-469, 2006.

F. Cappello, E. Caron, M. Dayde, F. Desprez, Y. Jegou et al., Grid'5000: a large scale and highly reconfigurable grid experimental testbed, The 6th IEEE/ACM International Workshop on Grid Computing, 2005., pp.99-106, 2005.
DOI : 10.1109/GRID.2005.1542730

URL : https://hal.archives-ouvertes.fr/hal-00684943

D. Balouek, A. Carpen-amarie, G. Charrier, F. Desprez, E. Jeannot et al., Adding Virtualization Capabilities to the Grid???5000 Testbed, 2012.
DOI : 10.1007/978-3-319-04519-1_1

URL : https://hal.archives-ouvertes.fr/hal-00946971

N. Capit, G. Da-costa, Y. Georgiou, G. Huard, C. Martin et al., A batch scheduler with high level components, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., pp.776-783, 2005.
DOI : 10.1109/CCGRID.2005.1558641

URL : https://hal.archives-ouvertes.fr/hal-00005106

E. Jeanvoine, L. Sarzyniec, and L. Nussbaum, Kadeploy3: Eicient and Scalable Operating System Provisioning for HPC Clusters, 2012.

N. Saboo, A. Kumar-singla, J. M. Unger, and L. V. Kalé, Emulating peta?ops machines and blue gene, Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, p.195, 2001.
DOI : 10.1109/ipdps.2001.925206

URL : http://charm.cs.illinois.edu/newPapers/01-04/paper.pdf

A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kosti? et al., Scalability and accuracy in a large-scale network emulator, ACM SIGOPS Operating Systems Review, vol.36, issue.SI, pp.271-284, 2002.
DOI : 10.1145/844128.844154

K. Webb, M. Hibler, R. Ricci, A. Clements, and J. Lepreau, Implementing the emulab-planetlab portal: Experience and lessons learned, Workshop on Real, Large Distributed Systems (WORLDS, 2004.

B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad et al., An integrated experimental environment for distributed systems and networks, Proc. of the Fifth Symposium on Operating Systems Design and Implementation, pp.255-270, 2002.

B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson et al., PlanetLab, ACM SIGCOMM Computer Communication Review, vol.33, issue.3, pp.3-12, 2003.
DOI : 10.1145/956993.956995

R. N. Calheiros, M. A. , S. Netto, A. F. César, R. De-rose et al., EMUSIM: an integrated emulation and simulation environment for modeling, evaluation, and validation of performance of Cloud computing applications, Software: Practice and Experience, vol.32, issue.2, pp.595-612, 2013.
DOI : 10.1002/spe.2124

P. Dickens, P. Heidelberger, and D. Nicol, Parallelized direct execution simulation of message-passing parallel programs, IEEE Transactions on Parallel and Distributed Systems, vol.7, issue.10, pp.1090-1105, 1996.
DOI : 10.1109/71.539740

R. Bagrodia, E. Deelman, and T. Phan, Parallel Simulation of Large-Scale Parallel Applications, International Journal of High Performance Computing Applications, vol.15, issue.1, pp.3-12, 2001.
DOI : 10.1177/109434200101500101

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia et al., A Framework for Application Performance Modeling and Prediction, Proc. of the ACM/IEEE Conference on Supercomputing (SC'02), 2002.

G. Zheng, G. Kakulapati, and L. Kale, BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th International Parallel and Distributed Processing Symposium, 2004.

E. León, R. Riesen, and A. Maccabe, Instruction-level simulation of a cluster at scale, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
DOI : 10.1145/1654059.1654063

B. Penof, A. Wagner, M. Tüxen, and I. Rüngeler, MPI-NeTSim: A Network Simulation Module for MPI, 2009 15th International Conference on Parallel and Distributed Systems, 2009.
DOI : 10.1109/ICPADS.2009.116

T. Hoe?er, C. Siebert, and A. Lumsdaine, LogGOPSim -Simulating Large-Scale Applications in the LogGOPS Model, Proc. of the ACM Workshop on Large-Scale System and Application Performance, pp.597-604, 2010.

M. Tikir, M. Laurenzano, L. Carrington, and A. Snavely, PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications, Proc. of the 15th International EuroPar Conference, pp.135-148, 2009.
DOI : 10.1007/BFb0052218

A. Núñez, J. Fernández, J. Garcia, F. Garcia, and J. Carretero, New techniques for simulating high performance MPI applications on large storage networks, 2008 IEEE International Conference on Cluster Computing, pp.40-57, 2010.
DOI : 10.1109/CLUSTR.2008.4663806

J. Zhai, W. Chen, and W. Zheng, PHANTOM: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node, Proc. of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.305-314, 2010.

M. Hermanns, M. Geimer, F. Wolf, and B. Wylie, Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.78-84, 2009.
DOI : 10.1109/PDP.2009.50

V. S. Adve, R. Bagrodia, E. Deelman, and R. Sakellariou, Compiler-Optimized Simulation of Large-Scale Applications on High Performance Architectures, Journal of Parallel and Distributed Computing, vol.62, issue.3, pp.393-426, 2002.
DOI : 10.1006/jpdc.2001.1800

P. Clauss, M. Stillwell, S. Genaud, F. Suter, H. Casanova et al., Single Node On-Line Simulation of MPI Applications with SMPI, 2011 IEEE International Parallel & Distributed Processing Symposium
DOI : 10.1109/IPDPS.2011.69

URL : https://hal.archives-ouvertes.fr/inria-00527150

S. Prakash, E. Deelman, and R. Bagrodia, Asynchronous parallel simulation of parallel programs, IEEE Transactions on Software Engineering, vol.26, issue.5, pp.385-400, 2000.
DOI : 10.1109/32.846297

G. Zheng, T. Wilmarth, P. Jagadishprasad, and L. Kalé, Simulation-Based Performance Prediction for Large Parallel Machines, International Journal of Parallel Programming, vol.15, issue.2-3, pp.183-207, 2005.
DOI : 10.1007/s10766-005-3582-6

R. Badia, J. Labarta, J. Giménez, and F. Escalé, Dimemas: Predicting MPI applications behavior in Grid environments, Proc. of the Workshop on Grid Applications and Programming Tools, 2003.

H. Casanova, A. Legrand, and M. Quinson, SimGrid: A Generic Framework for Large-Scale Distributed Experiments, Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 2008.
DOI : 10.1109/UKSIM.2008.28

URL : https://hal.archives-ouvertes.fr/inria-00260697

P. Velho and A. Legrand, Accuracy study and improvement of network simulation in the SimGrid framework, Proceedings of the Second International ICST Conference on Simulation Tools and Techniques, 2009.
DOI : 10.4108/ICST.SIMUTOOLS2009.5592

URL : https://hal.archives-ouvertes.fr/inria-00361031

T. Hoe?er, T. Schneider, and A. Lumsdaine, Accurately Measuring Overhead, Communication Time and Progression of Blocking and Nonblocking Collective Operations at Massive Scale, International Journal of Parallel, Emergent and Distributed Systems, vol.25, issue.4, pp.241-258, 2010.

F. Desprez, G. S. Markomanolis, and F. Suter, Evaluation of Pro?ling Tools for the Acquisition of Time Independent Traces, 2013.

R. Kufrin, Perfsuite: An Accessible, Open Source Performance Analysis Environment for Linux, Proceedings of the 6th International Conference on Linux Clusters: The HPC Revolution 2005 (LCI-05), Chapel Hill, NC, 2005.

J. Vetter and M. Mccracken, Statistical Scalability Analysis of Communication Operations in Distributed Applications, Proccedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'01), pp.123-132, 2001.

A. Chan, W. Gropp, and E. Lusk, User's Guide for MPE: Extensions for MPI Programs, 1998.

A. Chan, W. Gropp, and E. Lusk, An Eicient Format for Nearly Constant- Time Access to Arbitrary Time Intervals in Large Trace Files, Scientific Programming, pp.155-165, 2008.

N. J. Wright, S. Smallen, C. M. Olschanowsky, J. Hayes, and A. Snavely, Measuring and Understanding Variation in Benchmark Performance, 2009 DoD High Performance Computing Modernization Program Users Group Conference, pp.438-443, 2009.
DOI : 10.1109/HPCMP-UGC.2009.72

M. Geimer, F. Wolf, B. Wylie, and B. Mohr, A scalable tool architecture for diagnosing wait states in massively parallel applications, Parallel Computing, vol.35, issue.7, pp.375-388, 2009.
DOI : 10.1016/j.parco.2009.02.003

S. Shende and A. Malony, The Tau Parallel Performance System, International Journal of High Performance Computing Applications, vol.20, issue.2, pp.287-311, 2006.
DOI : 10.1177/1094342006064482

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber et al., The Vampir Performance Analysis Tool-Set, Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, pp.139-155, 2008.
DOI : 10.1007/978-3-540-68564-7_9

S. Dieter-an-mey, C. Biersdorf, K. Bischof, D. Diethelm, M. Eschweiler et al., Score-P?A uni?ed performance measurement system for petascale applications, Proc. of the CiHPC: Competence in High Performance Computing, HPC Status Konferenz der Gauß-Allianz e.V, pp.1-12, 2010.

M. Pettersson, Perfctr: the Linux Performance Monitoring Counters Driver

M. Geimer, F. Wolf, A. Knüpfer, B. Mohr, and B. J. Wylie, A Parallel Trace-Data Interface for Scalable Performance Analysis, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing , PARA'06, pp.398-408, 2007.
DOI : 10.1007/978-3-540-75755-9_49

D. Eschweiler, M. Wagner, M. Geimer, A. Knüpfer, W. E. Nagel et al., Open trace format 2 -the next generation of scalable trace formats and support libraries, Proc. of the Intl. Conference on Parallel Computing (ParCo), p.2011

F. Wolf and B. Mohr, EPILOG binary trace-data format, Zentralinstitut für Angewandte Mathematik, Forschungszentrum Jülich. FZJ-ZAM, 2004.

G. Benoit-claudel, O. Huard, and . Richard, Taktuk, adaptive deployment of remote executions, Proceedings of the 18th ACM international symposium on High performance distributed computing, HPDC '09, pp.91-100, 2009.

G. Markomanolis and F. Suter, Time-Independent Trace Acquisition Framework ? A Grid'5000 How-to, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00593842

F. Desprez, G. S. Markomanolis, M. Quinson, and F. Suter, Assessing the Performance of MPI Applications through Time-Independent Trace Replay, 2011 40th International Conference on Parallel Processing Workshops, pp.467-476, 2011.
DOI : 10.1109/ICPPW.2011.33

URL : https://hal.archives-ouvertes.fr/inria-00546992

M. Noeth, F. Mueller, M. Schulz, and B. R. De-supinski, Scalable Compression and Replay of Communication Traces in Massively Parallel Environments, Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007.

E. Jeannot, Improving Middleware Performance with AdOC: An Adaptive Online Compression Library for Data Transfer, 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
DOI : 10.1109/IPDPS.2005.254

URL : https://hal.archives-ouvertes.fr/inria-00000285

K. Silas-de-munck, J. Vanmechelen, and . Broeckhove, Improving the Scalability of SimGrid Using Dynamic Routing, Proceedings of the 9th International Conference on Computational Science: Part I, ICCS '09, pp.406-415, 2009.
DOI : 10.1007/978-3-642-01970-8_40

L. Bobelin, A. Legrand, D. A. Marquez, P. Navarro, M. Quinson et al., Scalable Multi-purpose Network Representation for Large Scale Distributed System Simulation, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.220-227, 2012.
DOI : 10.1109/CCGrid.2012.31

URL : https://hal.archives-ouvertes.fr/hal-00650233

P. Bedaride, S. Genaud, A. Degomme, A. Legrand, G. Markomanolis et al., Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00821446

P. Velho, L. Schnorr, H. Casanova, and A. Legrand, On the validity of flow-level tcp network models for grid and cloud simulations, TOMACS) ? Under revision, 2013.
DOI : 10.1145/2517448

URL : https://hal.archives-ouvertes.fr/hal-00872476

F. Desprez, G. S. Markomanolis, and F. Suter, Improving the accuracy and eiciency of time-independent trace replay, SC Workshops, 2012.

N. Rajovic, N. Puzovic, L. Vilanova, C. Villavieja, and A. Ramirez, The low-power architecture approach towards exascale computing, Proceedings of the second workshop on Scalable algorithms for large-scale systems, ScalA '11, 2011.
DOI : 10.1145/2133173.2133175