C. Dorian, . Arnold, H. Dong, . Ahn, R. Bronis et al., Stack trace analysis for large scale debugging HPCToolkit: Tools for performance analysis of optimized parallel programs, Parallel and Distributed Processing Symposium Concurrency and Computation: Practice and Experience, pp.1-10685, 2007.

H. Dong, . Ahn, R. Bronis, I. De-supinski, G. L. Laguna et al., Scalable temporal order analysis for large scale debugging Storage and Analysis, SC '09, Proceedings of the Conference on High Performance Computing NetworkingAll13b] Allinea. Allinea MAP [Amb03] M. Amblard. Conventions & management, pp.1-44, 2003.

S. Dieter-an-mey, C. Biersdorff, K. Bischof, D. Diethelm, M. Eschweiler et al., Score-P: A Unified Performance Measurement System for Petascale Applications Effectively presenting call path profiles of application performance, Proc. of the CiHPC: Competence in High Performance Computing, HPC Status Konferenz der Gauß- Allianz e.V. Parallel Processing Workshops (ICPPW) 39th International Conference on, pp.65-173, 2010.

C. Dorian, . Arnold, D. Gary, B. Pack, S. Augonnet et al., Tree-based overlay networks for scalable applications StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Cédric Augonnet. Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System's Perspective Complete Guides Series, Parallel and Distributed Processing Symposium IPDPS 2006. 20th InternationalAxe07] J. Axelson. Serial port complete [electronic resource]: COM ports, USB virtual COM ports, and ports for embedded systems, pp.62-75187, 2006.

H. David, E. Bailey, . Barszcz, T. John, . Barton et al., The nas parallel benchmarks summary and preliminary results, Supercomputing, 1991. Supercomputing'91. Proceedings of the 1991 ACM/IEEE Conference on, pp.158-165, 1991.

A. David, J. Bader, S. Berry, R. Kahan, E. J. Murphy et al., Search") http://www.graph500.org/ Specifications.html Version 1.1 Unraveling data race detection in the Intel Thread Checker, First Workshop on Software Tools for Multi-core Systems (STMCS), in conjunction with IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp.19-68, 2006.

K. Beck, M. Beedle, A. Van-bennekum, A. Cockburn, W. Cunningham et al., The Agile Manifesto, p.33, 2001.

. Bcf-+-12-]-o, L. Bressand, A. Colombet, G. Fontaine, J. Harel et al., Hercule: A library of scientific BIBLIOGRAPHY data management Revue Scientifique et Technique de la Direction des applications militaires Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations, CHOCS (Numéro 41), pp.29-37, 2004.

]. Bec00 and . Becker, Extreme Programming Explained Embrace Change An Alan R. Apt Book Series Timestamp Synchronization of Concurrent Events Extending the scope of the controlled logical clock Identifying the root causes of wait states in large-scale parallel applications, of IAS Series, Forschungszentrum Jülich Parallel Processing (ICPP) 39th International Conference on, pp.34-71171, 2000.

D. Robert, . Blumofe, F. Christopher, . Joerg, C. Bradley et al., Cilk: An efficient multithreaded runtime system Replay-Based Synchronization of Timestamps in Event Traces of Massively Parallel Applications In Parallel Processing -Workshops Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with Vampir NG [Bor07] Dhruba Borthakur. The hadoop distributed file system: Architecture and design PERISCOPE: An Online-Based Distributed Performance Analysis Tool, International Conference on OpenMP Shared Memory Parallel ProgrammingBPG10] Shajulin Benedict, Ventsislav Petkov, and Michael Gerndt Tools for High Performance Computing, pp.23-212, 1995.

J. Besnard, M. Pérache, W. J. , D. Barthou, A. C. Rubial et al., Event Streaming for Online Performance Measurements Reduction Performance tuning of x86 openmp codes with maqao Timestamp synchronization for event traces of large-scale messagepassing applications, Tools for High Performance Computing Proceedings of the 14th European PVM/MPI Conference, pp.15-95, 2007.

J. Peter, P. Braam, ]. L. Schwanbt87, L. Boltanski, and . Thévenot, Lustre: The intergalactic file system Les économies de la grandeur, Ottawa Linux Symposium , page 50, pp.73-108, 1987.

D. Bohme, F. Wolf, and M. Geimer, product/platforms/hw-extremcomp/ hw-bullx-sup-node/bullx-s6010 Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications, Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pp.26-2538, 2010.

D. Emery, . Berger, G. Benjamin, and . Zorn, DieHard: probabilistic memory safety for unsafe languages, In ACM SIGPLAN Notices, vol.41, pp.158-168, 2006.

. Committee, Tool interface standard (TIS) executable and linking format (ELF) specification, p.109, 1995.

C. Donald, D. Chamberlin, and R. Boyce, SEQUEL: A structured English query language, Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and controlCB91] Bernadette Charron-Bost. Concerning the size of logical clocks in distributed systems, pp.111-249, 1974.

. Inf, P. Process, M. Carribault, H. Pérache, and . Jourdren, 91)90055- M Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications, CPJ11] Patrick Carribault, Marc Pérache, and Hervé Jourdren IWOMPCri89] Flaviu Cristian. Probabilistic Clock Synchronization . Distributed Computing, pp.11-16, 1989.

L. Djoudi, D. Barthou, P. Carribault, W. Jalby, C. Lemuet et al., Exploring application performance: a new tool for a static/dynamic approach, LACSI SymposiumDCPJ12] Sylvain Didelot, Patrick Carribault, Marc Pérache, and William Jalby. Improving MPI Communication Overlap with Collaborative Polling. In EuroMPI, pp.41-49, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00141071

G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman et al., Swaminathan Sivasubramanian , Peter Vosshall, and Werner Vogels . Dynamo: amazon's highly available keyvalue store, ACM Symposium on Operating Systems Principles: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp.205-220, 2007.

B. Desouza, . Kuhn, R. Bronis, V. De-supinski, S. Samofalov et al., Automated, scalable debugging of MPI programs with Intel?? Message Checker, Proceedings of the second international workshop on Software engineering for high performance computing system applications , SE-HPCS '05, pp.78-82, 2005.
DOI : 10.1145/1145319.1145342

W. E. Muller and . Nagel, Internal Timer Synchronization for Parallel Event Tracing GANESHA, a multi-usage with large cache NFSv4 server, Recent Advances in Parallel Virtual Machine and Message Passing Interface Linux Symposium, pp.202-209, 2007.

J. J. Dongarra, P. Luszczek, and A. Petitet, LINPACK Benchmark, Supercomputing, 1st International Conference Proceedings, pp.1546-55, 1987.
DOI : 10.1007/978-0-387-09766-4_155

J. De-oliveira-stein, G. Chassin-de-kergommeaux, and . Mounié, Pajé trace file format, Tech. rep, pp.63-112, 2003.

J. [. Danhof, M. Quisenberry, and . Zargham, Concurrency in blackboard systems, Proceedings of the third international conference on Industrial and engineering applications of artificial intelligence and expert systems , IEA/AIE '90, pp.109-113, 1990.
DOI : 10.1145/98784.98804

H. Thomas and . Dunigan, Hypercube clock synchronization . Concurrency -Practice and Experience, pp.257-268, 1992.

J. Elson, L. Girod, and D. Estrin, Fine-grained network time synchronization using reference broadcasts, ACM SIGOPS Operating Systems Review, vol.36, issue.SI, pp.147-163, 2002.
DOI : 10.1145/844128.844143

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.7382

[. El-ghazawi and L. Smith, UPC: unified parallel C, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, pp.27-50, 2006.
DOI : 10.1002/0471478369

D. Lee, . Erman, R. Victor, . Lesserem88-]-r, T. Engelmore et al., The HEARSAY-II speech understanding system: Integrating knowledge to resolve uncertainty Computing Surveys Blackboard systems . Insight series in artificial intelligence Structure and Function of the CRYSALIS System, Proceedings of the 6th international joint conference on Artificial intelligence, pp.213-253, 1979.

D. Eschweiler, M. Wagner, M. Geimer, A. Knüpfer, E. Wolfgang et al., Open Trace Format 2-The next generation of scalable trace formats and support libraries Intelligence Service: construisez votre propre système expert, Proc. of the Intl. Conference on Parallel Computing (ParCo), pp.61417-421, 1987.

M. Fürlinger and . Gerndt, Automated Performance Analysis Using ASL Performance Properties, Applied Parallel Computing. State of the Art in Scientific Computing, pp.390-397, 2007.
DOI : 10.1007/978-3-540-75755-9_48

J. Feo, J. Gilbert, K. Madduri, and B. Mann, HPCS Scalable Synthetic Compact Applications #2 Graph Analysis, p.19, 2006.

C. J. Fidge, Timestamps in message passing systems that preserve the partial ordering VampirTrace 5.14.3 User Manual Scalable massively parallel I/O to tasklocal files, Theoretical Computer Science TU Dresden Center for Information Services and High Performance Computing (ZIH) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.69-100, 1988.

[. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00647474

O. Gerndt and K. Fürlinger, Specification and detection of performance problems with ASL, Concurrency and Computation: Practice and Experience, vol.28, issue.11, pp.741451-1464, 2007.
DOI : 10.1002/cpe.1123

G. E. Gabriel, G. Fagg, T. Bosilca, . Angskun, J. Jack et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

[. Gerndt, E. Fürlinger, and . Kereku, Periscope: Advanced techniques for performance analysis, Proceedings of the 2005 International Conference on Parallel Computing, pp.15-26, 2005.

[. Ghemawat, H. Gobioff, S. Gerndt, E. Gropp, E. Lusk et al., The Google file system, GN00] Emden R Gansner and Stephen C North. An open graph visualization system and its applications to software engineering. Software Practice and Experience, pp.29-43, 1996.
DOI : 10.1145/1165389.945450

]. R. Gon09 and J. Gray, Digital Image Processing. Pearson Education The transaction concept: Virtues and limitations, Proceedings of the Very Large Database Conference, pp.167-144, 1981.

S. Paul, ]. A. Grahamgra03, and . Grama, Logical hardware debuggers for FPGA-based systems Introduction to Parallel Computing. Pearson Education, pp.61-87, 2001.

M. Geimer, P. Saviankou, A. Strube, Z. Szebenyi, F. Wolf et al., Further Improving the Scalability of the Scalasca Toolset, Applied Parallel and Scientific Computing, pp.463-473, 2012.
DOI : 10.1177/1094342006064482

L. John, B. J. Wolf, E. Wylie, D. Ábrahám, B. Becker et al., Reevaluating Amdahl's law The Scalasca performance toolset architecture, Communications of the ACM Concurr. Comput. : Pract. Exper, vol.31, issue.5, pp.532-533, 1988.

[. Geimer, F. Wolf, J. Brian, B. Wylie, and . Mohr, A scalable tool architecture for diagnosing wait states in massively parallel applications, Parallel Computing, vol.35, issue.7, pp.375-388, 2009.
DOI : 10.1016/j.parco.2009.02.003

R. Gusella, S. Zattihai12, and ]. J. Hainaut, TEMPO: A Network Time Controller for a Distributed Berkeley UNIX System Bases de données -2e éd. - Concepts, utilisation et développement: Concepts , utilisation et développement, Informatique. Dunod, issue.2, pp.93-74, 1983.

J. Hill and D. Culler, A wireless embedded sensor architecture for system-level optimization Adaptive Software Development: An Evolutionary Approach to Controlling Chaotic Systems, pp.70-103, 2000.

[. Hermanns, S. Krishnamoorthy, and F. Wolf, A Scalable Replay-based Infrastructure for the Performance Analysis of One-sided Communication, Proc. of the 1st Intl. Workshop on High-performance Infrastructure for Scalable Tools (WHIST), p.65, 2011.

S. Huband, D. Mcdonald-kevin, A. Huck, D. Allen, and . Malony, A preliminary topological debugger for MPI programs Perfexplorer: A performance data mining framework for large-scale parallel computing, Cluster Computing and the Grid Proceedings. First IEEE/ACM International Symposium on Proceedings of the 2005 ACM/IEEE conference on Supercomputing, pp.422-429, 2001.

T. [. Halpin, A. Morgan-kevin, . Huck, D. Allen, R. Malony et al., Information Modeling and Relational Databases The Morgan Kaufmann Series in Data Management Systems Design and implementation of a parallel performance data management framework, Parallel Processing, 2005. ICPP 2005. International Conference on, pp.74-473, 2005.

T. Hilbrich, M. S. Müller, R. Bronis, M. De-supinski, W. E. Schulz et al., GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1364-1375, 2009.
DOI : 10.1109/IPDPS.2012.123

A. Kevin, . Huck, D. Allen, S. Malony, A. Shende et al., Scalable, automated performance analysis with tau and perfexplorer, Parallel Computing (ParCo), pp.1-8, 2007.

[. Hood, project, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools , SPDT '96, pp.127-136, 1996.
DOI : 10.1145/238020.238058

I. J. Syst, Performance instrumentation and compiler optimizations for MPI/OpenMP applications, OpenMP Shared Memory Parallel Programming, pp.4-12, 2002.

]. A. Springerht99, D. Hunt, and . Thomas, The Pragmatic Programmer: From Journeyman to Master. Pearson Education, pp.67-96, 1999.

W. Huang, Z. Wang, and J. Ma, Design of DMPI on DAWNING-3000, Recent Advances in Parallel Virtual Machine and Message Passing Interface Lifecycle Process ModelINT10] Intel 64 and IA-32 Architectures Software Developer's ManualInt12a] Intel. Intel Debugger for Linux (IDB), pp.314-322, 1995.
DOI : 10.1007/3-540-45825-5_48

R. Emily, . Jacobson, J. Michael, . Brim, P. Barton et al., A lightweight library for building scalable tools, Applied Parallel and Scientific Computing, pp.61-62, 2012.

H. Jagode, J. Dongarra, S. Alam, J. Vetter, W. Spear et al., A Holistic Approach for Performance Measurement and Analysis for Petascale Applications, In Computational Science?ICCS, pp.686-695, 2009.
DOI : 10.1007/978-3-642-01973-9_77

[. Jézéquel, Building a global time on parallel machines, WDAG, pp.136-147, 1989.
DOI : 10.1007/3-540-51687-5_38

[. Jourdren, HERA: A Hydrodynamic AMR Platform for Multi-Physics Simulations In Adaptive Mesh Refinement -Theory and Applications , volume 41 of Lecture Notes in Computational Science and Engineering, pp.154-158, 2005.

A. Jannesari, F. Walter, and . Tichy, On-the-fly race detection in multi-threaded programs, Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging, pp.6-68, 2008.

A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel, Introducing the open trace format (OTF) In Computational Science?ICCS, pp.526-533, 2006.

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber et al., The vampir performance analysis toolset Open trace format api specification version 1.1, Tools for High Performance Computing, pp.139-155, 2006.

V. Krammer, D. Himmler, and . Lecomber, Coupling DDT and Marmot for debugging of MPI applications, Proc. of ParCo, pp.4-7, 2007.

J. Matthew, T. Koop, . Jones, K. Dhabaleswar, V. Laxmikant et al., Mvapich-aptus: Scalable high-performance multi-transport MPI over infiniband CHARM++: a portable concurrent object oriented system based on C++, Parallel and Distributed Processing, pp.1-12, 1993.

A. Kleen, T. Update, P. Sushmitha, J. Kini, J. Liu et al., Fast and scalable barrier using rdma and multicast mechanisms for infiniband-based clusters, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.70-369, 2005.

[. Krammer, M. S. Müller, and M. M. Resch, MPI Application Development Using the Analysis Tool MARMOT, ICCS 2004, volume LNCS 3038, p.67, 2004.
DOI : 10.1007/978-3-540-24688-6_61

[. Krammer, S. Matthias, . Müller, M. Michael, and . Resch, MPI I/O analysis and error detection with MARMOT Deadlock detection in distributed databases, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.242-250, 1987.

[. Krammer, M. Michael, and . Resch, Correctness Checking of MPI One-Sided Communication Using Marmot, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.105-114, 2006.
DOI : 10.1007/11846802_21

S. Koliai, E. Zuckerman, M. Oseret, T. Ivascot, D. Moseley et al., A Balanced Approach to Application Performance Tuning, Languages and Compilers for Parallel Computing, pp.111-125, 2010.
DOI : 10.1007/978-3-642-13374-9_8

J. Labarta and . Starss, A programming model for the multicore era, PRACE Workshopâ ? A ´ ZNew Languages & Future Technology Prototypesâ ? A ´ Z at the Leibniz Supercomputing Centre in Garching (Germany)Lam78] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system, p.24, 2010.

K. Robert, . Lindsay, G. Bruce, . Buchanan, A. Edward et al., Applications of artificial intelligence for organic chemistry: The DENDRAL project, Commun. ACM Structure, vol.21, issue.227, pp.558-565, 1978.

R. Victor, . Lesser, G. Daniel, ]. C. Corkilllei85, and . Leiserson, The distributed vehicle monitoring testbed: A tool for investigating distributed problem solving networks . AI magazine Fat-trees: Universal networks for hardware-efficient supercomputing. Computers, IEEE Transactions, vol.4, issue.310, pp.15-7334892, 1969.

[. Levi, A. Steven, and . Guccione, BoardScope: A debug tool for reconfigurable systems. Configurable Computing Technology and its uses in High Performance Computing, DSP and Systems Engineering, pp.239-246, 1998.

K. Li, P. Hudaklin90, A. Mark, M. Liao, D. W. Martonosi et al., Memory coherence in shared virtual memory systems, Proceedings of the Summer USENIX Conference Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09 Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, SPAA '99, pp.321-359, 1989.
DOI : 10.1145/75104.75105

. Chee-wai-lee, D. Allen, A. Malony, and . Morris, TAUmon: scalable online performance data analysis in TAU, Euro-Par 2010 Parallel Processing Workshops, pp.493-499, 2011.

[. Lorenz, P. Philippen, D. Schmidl, and F. Wolf, Profiling of OpenMP tasks with Score-P [lt11] The libunwind team. The libunwind project, Parallel Processing Workshops (ICPPW), 2012 41st International Conference on, pp.444-453, 2011.

]. J. Mar91 and . Martin, Rapid Application Development. The James Martin productivity series, p.33, 1991.

[. Marick and F. Berman, New Models for Test De- velopment. http://www.exampler.com/ testing-com/writings/new-models.pdf Panorama: A portable, extensible parallel debugger, Mat88] Friedemann Mattern. Virtual Time and Global States of Distributed Systems ACM Sigplan Notices, pp.31-69, 1988.

J. Philip, S. Mucci, C. Browne, G. Deane, and . Ho, PAPI: A portable interface to hardware performance counters, Proc. Department of Defense HPCMP Users Group Conference, p.64, 1999.

D. Allen, S. Malony, S. Biersdorff, H. Shende, S. Jagode et al., Parallel performance measurement of heterogeneous parallel systems with GPUs, Parallel Processing (ICPP), 2011 International Conference on, pp.176-185, 2011.

P. Barton, M. D. Miller, J. M. Callaghan, J. K. Cargille, R. Hollingsworth et al., The Paradyn parallel performance measurement tool Object-oriented software construction. Prentice-Hall International Series in Computer Science Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access The paradyn parallel performance tools and pvm, MF08] MPI-Forum. MPI: A message passing interface standard, version 2.1 EuroMPI, pp.37-46, 1994.

L. David and . Mills, Internet Time Synchronization: the Network Time Protocol, IEEE Transactions on Communications, vol.39, pp.1482-1493, 1991.

A. Morris, A. D. Malony, S. Shende, and K. Huck, Design and Implementation of a Hybrid Parallel Performance Measurement System, 2010 39th International Conference on Parallel Processing, p.65, 2010.
DOI : 10.1109/ICPP.2010.57

[. Mohr, D. Allen, S. Malony, F. Shende, and . Wolf, Towards a performance tool interface for OpenMP: An approach based on directive rewriting. Citeseer, p.65, 2001.

[. Mohr, D. Allen, S. Malony, F. Shende, ]. J. Wolfmoi86 et al., Design and prototype of a performance tool interface for OpenMP, Intelligence et Conception . Nouvelle encyclopédie des sciences et des techniques. Fondation Diderot, pp.105-128, 1986.
DOI : 10.1023/A:1015741304337

[. Gordon and E. Moore, x86: rewrite SMP TSC sync code, Linux Kernel (commit 95492e4646e5de8b43d9a7908d6177fb737b61f0) Cramming more components onto integrated circuits, Mor11] Stéphanie Moreaud. Mouvement de données et placement des tâches pour les communications haute performance sur machines hiérarchiques, pp.70-90, 1965.

D. Allen, D. A. Malony, and . Reed, Models for Performance Perturbation Analysis Performance measurement intrusion and perturbation analysis. Parallel and Distributed Systems, Workshop on Parallel and Distributed Debugging, pp.15-25, 1991.

D. Allen, . Malony, S. Sameer, A. Shende, and . Morris, Phase-based parallel performance profiling, Proceedings of the PARCO 2005 conference, p.66, 2005.

A. Morris, W. Spear, D. Allen, S. Malony, and . Shende, Observing Performance Dynamics Using Parallel Profile Snapshots, Euro- Par 2008?Parallel Processing, pp.162-171
DOI : 10.1007/978-3-540-85451-7_18

E. Maillet and C. Tron, On Efficiently Implementing Global Time for Performance Evaluation on Multiprocessor Systems, Journal of Parallel and Distributed Computing, vol.28, issue.1
DOI : 10.1006/jpdc.1995.1090

[. Nii, N. Aiello, and J. Rice, Experiments on Cage and Poligon: Measuring the Performance of Parallel Blackboard Systems, p.73, 1990.
DOI : 10.1016/B978-1-55860-092-8.50018-6

[. Neophytou and P. Evripidou, Net-dbx: a web-based debugger of MPI programs over low-bandwidth lines. Parallel and Distributed Systems, NET13] NetCDF : Network Common Data Form, pp.986-995, 2001.

A. J. Anton, D. Virginia-lo-allen, A. Malony, D. Morris, B. Arnold et al., Readings from the AI magazine Distributed shared memory: A survey of issues and algorithms A framework for scalable, parallel performance monitoring using tau and mrnet How to shadow every byte of memory used by a program Valgrind: a framework for heavyweight dynamic binary instrumentation, NMM + 08] Aroon Nataraj International Workshop on Scalable Tools for High-End Computing Proceedings of the 3rd international conference on Virtual execution environments Office of Government Commerce. Managing Successful Projects with PRINCE2. Prince Guidance Series. H.M. Stationery Office, pp.7252-60, 1988.

P. Planas, M. Rosa, E. Badia, J. Ayguadé, and . Labarta, Sun Studio 12: Thread Analyzer User's Guide. http://docs.oracle.com/cd/ E19205-01/820-0619/820-0619.pdf Hierarchical task-based programming with StarSs MPC: A unified parallel framework for HPC Revue Scientifique et Technique de la Direction des applications militaires, The OpenACC TM Application Programming Interface CHOCS (Numéro 41)PCJ09] Marc Pérache, Patrick Carribault, and Hervé Jourdren. MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption PVM/MPIPCJ10] Marc Pérache, Patrick Carribault, and Hervé Jourdren. User level DB: a debugging API for user-level thread libraries IPDPS Workshops, pp.24-68284, 2007.

J. Steven, . Plimpton, D. Karen, and . Devine, MapReduce in MPI for large-scale graph algorithms, Parallel Computing, vol.37, issue.9, pp.610-632, 2011.

B. Pérache, Electric Fence malloc Contribution à l'élaboration d'environnements de programmation dédiés au calcul scientifique hautes performances, Thèse de doctorat, spécialité informatique, pp.68-14617, 2001.

S. Pervez, G. Gopalakrishnan, M. Robert, R. Kirby, R. Palmer et al., Practical model-checking method for verifying correctness of mpi programs MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Recent Advances in Parallel Virtual Machine and Message Passing Interface Euro-Par, pp.344-353, 1995.

T. [. Poppendieck and . Poppendieck, Lean software development: an agile toolkit. The Agile Software Development Series, p.34, 2003.

C. Philip, D. C. Roth, B. P. Arnold, and . Miller, MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools, Proceedings of the 2003 ACM/IEEE conference on Supercomputing , SC '03, pp.21-61, 2003.

. Giridhar-ravipati, R. Andrew, N. Bernat, . Rosenblum, P. Barton et al., Toward the deconstruction of Dyninst, p.64, 2007.

P. Steven, ]. S. Reissrn10, P. Russell, and . Norvig, Trace-based debugging Artificial Intelligence: A Modern Approach, Automated and Algorithmic Debugging Series in Artificial Intelligence, pp.305-314, 1993.

P. Charles-rothrot11 and ]. V. Rota, Scalable on-line automated performance diagnosis Gestion de projet agile: Avec Scrum, Lean, eXtreme Programming, Architecte logiciel . Eyrolles, pp.62-93, 1975.

W. Winston and . Royce, Managing the development of large software systems, proceedings of IEEE WESCON, p.31, 1970.

B. Robert, R. Ross, and . Thakur, PVFS: A parallel file system for Linux clusters, Proceedings of the 4th Annual Linux Showcase and Conference Beedle. Agile Software Development with Scrum. Series in agile software development. Pearson Education International, pp.391-430, 2000.

[. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, AddressSanitizer: A fast address sanity checker

E. Schoen, The CAOS system, p.73, 1986.

[. Scemama, M. Caffarel, E. Oseret, W. Schulz, R. Bronis et al., Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond PNMPI tools: a whole lot greater than the sum of their parts The Green Index: A Metric for Evaluating System- Wide Energy Efficiency in HPC Systems, Proceedings of the 2007 ACM/IEEE conference on Supercomputing 8th IEEE Workshop on High-Performance, Power- Aware Computing (HPPAC), pp.66-126, 2007.

. Sgs-+-11-]-zoltán, T. Szebenyi, M. Gamblin, . Schulz, R. Bronis et al., Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs GPFS: A shared-disk file system for large computing clusters, IPDPS Proceedings of the First USENIX Conference on File and Storage Technologies, pp.65-231, 2002.

[. Shortliffe, Computer-based medical consultations: MYCIN, p.72, 1976.

K. Serebryany, T. Herbert, and A. Simon, ThreadSanitizer, Proceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pp.62-71, 1974.
DOI : 10.1145/1791194.1791203

]. H. Sim97, . Simon-stephan, . Seidl, R. Knüpfer, and . Müller-pfefferkorn, Administrative Behavior VTF3-A Fast Vampir Trace File Low-Level Management Library, pp.177-64, 1997.

R. Sekhar, A. D. Sarukkai, . Malonysmh98-]-sameer-shende, D. Allen, . Malony et al., Perturbation Analysis of High Level Instrumentation for SPMD Programs Dynamic performance callstack sampling: Merging TAU and DAQV An approach to creating performance visualizations in a parallel profile analysis tool, PPOPP Applied Parallel Computing Large Scale Scientific and Industrial Problems Euro- Par 2011: Parallel Processing Workshops, pp.44-53, 1993.

J. Timothy, . Sheehan, D. Allen, . Malony, S. Sameer et al., A runtime monitoring framework for the tau profiling system Survey of deadlock detection in distributed concurrent programming environments and its application to real-time systems and Ada, Computing in Object-Oriented Parallel Environments, pp.170-181, 1991.

[. Sterling, E. Shapiro, and M. Eytan, The Art of Prolog, IEEE Expert, vol.2, issue.2, p.72, 1990.
DOI : 10.1109/MEX.1987.4307074

. Ste90, L. Guy, and . Steele, Common LISP: the language. Digital Pr, p.72, 1990.

H. Sut05, . Sutter, J. Brian, and . Wylie, The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software Fengguang Song and Felix Wolf. CUBE User Manual Performance Analysis of Long-running Applications, Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pp.20-65, 2004.

R. Sze12-]-zoltán-szebenyi-nathan, L. Tallent, J. M. Adhianto, and . Mellor-crummey, Capturing Parallel Performance Dynamics ISBN 978-3- 89336-798-6 Scalable identification of load imbalance in parallel executions using call path profiles, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.65-66, 2010.

[. Tchiboukdjian, P. Carribault, and M. Pérache, Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.366-377, 2012.
DOI : 10.1109/IPDPS.2012.42

]. Tea06, . Lam-/-mpi-team, /. Xmpi-?-a-run, and G. Debug, http://www.lam-mpi.org/software [Tea13a] Redis Team. Redis, open source, BSD licensed, advanced key-value store, pp.61-74, 2006.

[. Thakur, E. Lusk, W. Nathan, R. Tallent, and J. M. Mellor-crummey, Users guide for ROMIO: A highperformance , portable MPI-IO implementation Effective performance measurement and analysis of multithreaded applications, In ACM Sigplan Notices, vol.44, pp.74-229, 1997.

R. Nathan, . Tallent, M. John, A. Mellor-crummey, and . Porterfield, Analyzing lock contention in multithreaded applications, In ACM Sigplan Notices, vol.45, pp.269-280, 2010.

[. Takeuchi and I. Nonaka, The New New Product Development Game, Harvard Business Review Curie Supercomputer, vol.500, pp.34-58, 1986.

[. Veríssimo, A. Casimiro, and L. Rodrigues, Using Atomic Broadcast to Implement a posteriori Agreement for Clock Synchronization, SRDS, pp.115-124, 1993.

S. Jeffrey, . Vetter, R. Bronis, and . Supinski, Dynamic software testing of MPI applications with Umpire Contemporary High Performance Computing: From Petascale Toward Exascale. Chapman and Hall/CRC Computational Science Series, Supercomputing, ACM/IEEE 2000 Conference, pp.51-51, 2000.

S. Jeffrey, M. O. Vetter, and . Mccracken, Statistical scalability analysis of communication operations in distributed applications, Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, p.67, 2001.

[. Veríssimo, L. Rodrigues, and A. Casimiro, CesiumSpray: a Precise and Accurate Global Time Service for Large-scale Systems . Real-Time Systems, pp.243-294, 1997.

[. Wolf and N. Bhatia, EARL-API Documentation, p.64, 2004.

C. Wu, A. Bolmarcich, M. Snir, D. Wootton, F. Parpia et al., From trace generation to visualization: A performance framework for distributed parallel systems Sequential performance analysis with callgrind and kcachegrind, Tools for High Performance Computing, pp.50-50, 2000.

J. N. Brian, M. Wylie, B. Geimer, D. Mohr, Z. Böhme et al., Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Processing Letters Hadoop: The definitive guide. O'Reilly Media, Inc., 2012 Cybernetics: Or, Control and Communication in the Animal and the Machine. The @MIT paperback series: Massachusetts Institute of Technology Conservative numerical methods for a two-temperature resistive MHD model with self-generated magnetic field term. In CEMRACS'10 research achievements: Numerical modeling of fusion The Machine That Changed the World: The Story of Lean Production? Toyota's Secret Weapon in the Global Car Wars That Is Now Revolutionizing World Industry, Hierarchical Multi-expert Signal Understanding,". Blackboard Systems, vol.20, issue.4, pp.397-414, 1961.

Q. James, G. L. Wilson, and . Kelling, The police and neighborhood safety: Broken windows, p.49, 1982.

[. Wolf, B. Mohr, F. Wolf, and B. Mohr, Automatic Performance Analysis of Hybrid MPI/OpenMP Applications EPILOG binary trace-data format Automatic Performance Analysis on Parallel Computers with SMP Nodes, Proc. of 11th Euromicro Workshop on Parallel Distributed and Network-Based Processing (PDP), pp.13-22, 2003.

M. Wolff, Analyse mathématique et numérique du système de la magnétohydrodynamique résistive avec termes de champ magnétique autogénéré, p.153, 2011.

J. Wr07-]-h-'sien, . Wong, P. Alistair, and . Rendell, The design of MPI based distributed shared memory systems to support OpenMP on clusters [ws13] Amazon web services. Amazon Simple Storage Service (Amazon S3), Cluster Computing IEEE International Conference on, pp.231-240, 2007.

]. A. Zel09 and . Zeller, Why programs fail: a guide to systematic debugging, pp.47-48, 2009.

O. Zaki, E. Lusk, W. Gropp, and D. Swider, Toward Scalable Performance Visualization with Jumpshot, International Journal of High Performance Computing Applications, vol.13, issue.3, pp.277-288, 1999.
DOI : 10.1177/109434209901300310