V. S. Adve, R. Bagrodia, J. C. Browne, E. Deelman, A. Dube et al., Poems : End-to-end performance design of large parallel adaptive computational systems, IEEE Transactions on Software Engineering, pp.26-1027, 2000.

A. Aggarwal, A. K. Chandra-et-m, and . Snir, On communication latency in PRAM computations, Proceedings of the first annual ACM symposium on Parallel algorithms and architectures , SPAA '89, pp.11-21, 1989.
DOI : 10.1145/72935.72937

G. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967.
DOI : 10.1145/1465482.1465560

H. H. Ammar, S. M. Islam, M. H. Ammar-et-s, and . Deng, Performance modeling of parallel algorithms, ICPP, issue.3, pp.68-71, 1990.

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Car-ter et al., The nas parallel benchmarks, rap. tech, The International Journal of Supercomputer Applications, 1991.

P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur-et-w, and . Gropp, Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand, 2007 International Conference on Parallel Processing (ICPP 2007), p.73, 2007.
DOI : 10.1109/ICPP.2007.14

V. E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic, 1965.

M. Berry, D. C. Et, . Koss-et-d, . Kuck-et-s, . Lo-et-y et al., The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, International Journal of High Performance Computing Applications, vol.3, issue.3, pp.3-9, 1989.
DOI : 10.1177/109434208900300302

F. Bodin, P. Beckman, D. Gannon, J. Gotwals, S. Narayana et al., Sage++ : An object-oriented toolkit and class library for building fortran and c++ restructuring tools, The second annual objectoriented numerics conference, pp.122-136, 1994.

S. Bokhari, A Shortest Tree Algorithm for Optimal Assignments Across Space and Time in a Distributed Processor System, IEEE Transactions on Software Engineering, vol.7, issue.6, pp.583-589, 1981.
DOI : 10.1109/TSE.1981.226469

S. H. Bokhari, On the Mapping Problem, IEEE Transactions on Computers, vol.30, issue.3, pp.207-214, 1981.
DOI : 10.1109/TC.1981.1675756

J. Bourgeois-et-f and . Spies, Performance Prediction of an NAS Benchmark Program with ChronosMix Environment, Euro-Par '00 : Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pp.208-216, 2000.
DOI : 10.1007/3-540-44520-X_28

S. D. Brookes, On the relationship of CCS and CSP, Proceedings of the 10th Colloquium on Automata, Languages and Programming, pp.83-96, 1983.
DOI : 10.1007/BFb0036899

S. A. Browning, The tree machine : a highly concurrent computing environment, Thèse doctorat, California Institute of Technology, 1980.

H. Casanova, A. Legrand-et-m, and . Quinson, SimGrid: A Generic Framework for Large-Scale Distributed Experiments, Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 2008.
DOI : 10.1109/UKSIM.2008.28

URL : https://hal.archives-ouvertes.fr/inria-00260697

L. Chai, Q. K. Gao-et-d, and . Panda, Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), pp.471-478, 2007.
DOI : 10.1109/CCGRID.2007.119

C. Howard-koelbel, Compiling Programs for Distributed Memory Machines, Thèse doctorat, 1990.

M. J. Clement, M. R. Steed-et-p, and . Crandall, Network performance modeling for PVM clusters, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '96, 1996.
DOI : 10.1145/369028.369040

C. Clos, A Study of Non-Blocking Switching Networks, Bell System Technical Journal, vol.32, issue.2, pp.406-424, 1953.
DOI : 10.1002/j.1538-7305.1953.tb01433.x

R. , C. Et-o, and . Zajicek, The apram : incorporating asynchrony into the pram model, SPAA '89 : Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, pp.169-178, 1989.

B. J. Cornea and . Bourgeois, Simulation of a P2P Parallel Computing Environment -Introducing dPerf, A Tool for Predicting the Performance of Parallel MPI or P2P-SAP Applications, 2010.

P. J. Courtois, F. L. Heymans-et-d, and . Parnas, Concurrent control with ???readers??? and ???writers???, Communications of the ACM, vol.14, issue.10, pp.667-668, 1971.
DOI : 10.1145/362759.362813

W. J. Dally-et-c and . Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Transactions on Computers, vol.36, issue.5, pp.547-553, 1987.
DOI : 10.1109/TC.1987.1676939

S. Dasgupta, A hierarchical taxonomic system for computer architectures, Computer, vol.23, issue.3, pp.23-64, 1990.
DOI : 10.1109/2.50273

E. W. Dijkstra, Solution of a problem in concurrent programming control, Communications of the ACM, vol.8, issue.9, p.569, 1965.
DOI : 10.1145/365559.365617

E. W. Dijkstra, Hierarchical ordering of sequential processes, Acta Informatica, vol.1, issue.2, pp.115-138, 1971.
DOI : 10.1007/BF00289519

E. W. Dijkstra, Cooperating sequential processes, The origin of concurrent programming : from semaphores to remote procedure calls, pp.65-138, 2002.

M. A. Dubois-et-f and . Briggs, Performance of Synchronized Iterative Processes in Multiprocessor Systems, IEEE Transactions on Software Engineering, vol.8, issue.4, pp.419-431, 1982.
DOI : 10.1109/TSE.1982.235576

J. A. Fisher-et-s and . Freudenberger, Predicting conditional branch directions from previous runs of a program, ASPLOS-V : Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, pp.85-95, 1992.

M. J. Flynn, Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972.
DOI : 10.1109/TC.1972.5009071

A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek-et-v et al., PVM-Parallel Virtual Machine: AUsers' Guide and Tutorial for Networked Parallel Computing, Computers in Physics, vol.9, issue.6, 1994.
DOI : 10.1063/1.4823450

W. K. Giloi, Towards a taxonomy of computer architecture based on the machine data type view, ISCA '83 : Proceedings of the 10th annual international symposium on Computer architecture, pp.6-15, 1983.

W. K. Giloi, Parallel supercomputer architectures and their programming models, Parallel Computing, vol.20, issue.10-11, pp.1443-1470, 1994.
DOI : 10.1016/0167-8191(94)90050-7

L. M. Goldschlager, A unified approach to models of synchronous parallel machines, Proceedings of the tenth annual ACM symposium on Theory of computing , STOC '78, pp.89-94, 1978.
DOI : 10.1145/800133.804336

W. L. Gropp-et-e and . Lusk, Reproducible Measurements of MPI Performance Characteristics, Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.11-18, 1999.
DOI : 10.1007/3-540-48158-3_2

D. Grove-et-p and . Coddington, Precise mpi performance measurement using mpibench, Proceedings of HPC Asia, 2001.

D. A. Grove-et-p and . Coddington, Communication benchmarking and performance modelling of mpi programs on cluster computers, Parallel and Distributed Processing Symposium, International, pp.15-249, 2004.

S. D. Hammond, G. R. Mudalige, J. A. Smith, S. A. Jarvis, J. A. Herd-man et al., WARPP: a toolkit for simulating high-performance parallel scientific codes, Proceedings of the Second International ICST Conference on Simulation Tools and Techniques, p.19, 2009.
DOI : 10.4108/ICST.SIMUTOOLS2009.5753

J. Handy, The cache memory book, 1993.

T. J. Harris, A survey of PRAM simulation techniques, ACM Computing Surveys, vol.26, issue.2, pp.187-206, 1994.
DOI : 10.1145/176979.176984

M. T. Heath-et-j and . Finger, ParaGraph : A Tool for Visualizing Performance of Parallel Programs, 1992.

J. Hennessy-et-d and . Patterson, Computer Architecture -A Quantitative Approach, 2003.

T. , H. Et-s, and . Ranka, A practical hierarchical model of parallel computation, Parallel and Distributed Processing Proceedings of the Third IEEE Symposium on, pp.18-25, 1991.

J. M. Hill, B. Mccoll, D. C. Stefanescu, M. W. Goudreau, K. Lang et al., The bsp programming library, 1998.

C. A. Hoare, Communicating sequential processes, Communications of the ACM, vol.26, issue.1, pp.100-106, 1983.
DOI : 10.1145/357980.358021

R. W. Hockney-et-c and . Jesshope, Parallel computers : architecture, programming and algorithms, 1981.

T. Hoefler, T. Schneider-et-a, and . Lumsdaine, Multistage switches are not crossbars: Effects of static routing in high-performance networks, 2008 IEEE International Conference on Cluster Computing, 2008.
DOI : 10.1109/CLUSTR.2008.4663762

S. Horwitz, T. Reps-et-d, and . Binkley, Interprocedural slicing using dependence graphs, PLDI '88 : Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pp.35-46, 1988.

]. R. Ibbett, G. Chochia, P. Coe, M. Cole, P. Heywood et al., Algorithms, architectures and models of computation, technical report ECS-CSG-22-96, 1996.

K. J. Ferrante and . Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1145/24039.24041

J. Postel, Transmission Control Protocol USC/Information Sciences Institute, sept, 1981.

J. Bourgeois, Prédiction de performance statique et semi-statique dans les systèmes répartis hétérogènes, Thèse doctorat, 2000.

B. H. Juurlink-et-h and . Wijshoff, The E-BSP model: Incorporating general locality and unbalanced communication into the BSP model, Proc. Euro-Par'96, pp.339-347, 1996.
DOI : 10.1007/BFb0024721

A. Kapelinkov, R. R. Muntz-et-m, and . Ercegovac, A modeling methodology for the analysis of concurrent systems and computations, Journal of Parallel and Distributed Computing, vol.6, issue.3, pp.568-597, 1989.
DOI : 10.1016/0743-7315(89)90007-5

D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini-et-m, H. J. Wasserman et al., Predictive performance and scalability modeling of a largescale application, Supercomputing '01 : Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp.37-37, 2001.

A. Krall, Improving semi-static branch prediction by code replication, PLDI '94 : Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, pp.97-106, 1994.

F. Leighton, Introduction to parallel algorithms and architectures : Arrays, trees, and hypercubes, 1992.

C. E. Leiserson, Fat-trees : universal networks for hardware-efficient supercomputing Fat-trees : universal networks for hardware-efficient supercomputing, IEEE Trans. Comput. IEEE Trans. Comput, vol.3462, pp.892-901, 1985.

D. Lugones, D. Franco-et-e, and . Luque, Dynamic routing balancing on infiniband networks, The Journal of Computer Science and Technology (JCS&T), vol.8, pp.104-110, 2008.

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg et al., Simics: A full system simulation platform, Computer, vol.35, issue.2, pp.35-50, 2002.
DOI : 10.1109/2.982916

E. Maillet, Le traçage logiciel d'applications parallèles : conception et ajustement de qualité, Thèse doctorat, 1992.

V. W. Mak-et-s and . Lundstrom, Predicting performance of parallel computations, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.257-270, 1990.
DOI : 10.1109/71.80155

M. Martinasso, Analyse et Modélisation des Communications Concurrentes dans les Réseaux Haute Performance, Thèse doctorat école doctorale Mathématiques, Sciences et Technologies de l'Information, 2007.

W. Mccoll, Foundations of time-critical scalable computing, Proceedings of the 15th IFIP World Computer Congress. Osterreichische Computer Gesellschaft, pp.93-107, 1998.

J. M. Mellor-crummey, V. S. Adve, B. Broom, D. G. Chavarría-miranda, R. J. Fowler et al., Advanced optimization strategies in the Rice dHPF compiler, Concurrency and Computation: Practice and Experience, vol.26, issue.5, pp.14-741, 2002.
DOI : 10.1002/cpe.647

S. Moore, D. Cronk, F. Wolf, A. Purkayastha, P. Teller et al., Performance Profiling and Analysis of DoD Applications Using PAPI and TAU, 2005 Users Group Conference (DOD-UGC'05), p.394, 2005.
DOI : 10.1109/DODUGC.2005.50

P. Mucci-et-k and . London, The mpbench report, rap. tech, 1998.

P. J. Mucci-et-s and . Moore, Papi users group, SC '06 : Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p.43, 2006.

M. S. Müller, M. Van-waveren, R. Liebermann, B. Whitney, H. Saito et al., SPEC MPI2007 -An application benchmark for clusters and hpc systems, ISC, 2007.

D. Quinlan, R. Vuduc, T. Panas, J. Härdtlein-et-a, and . Saebørnsen, Support for whole-program analysis and verification of the one-definition rule in c++, Static Analysis Summit, 2006.

R. W. Hockney, The communication challenge for MPP: Intel Paragon and Meiko CS-2, Parallel Computing, pp.389-398, 1994.
DOI : 10.1016/S0167-8191(06)80021-9

R. Reussner, P. L. Sanders-et-j, and . Träff, SKaMPI: A Comprehensive Benchmark for Public Benchmarking of MPI, Scientific Programming, vol.10, issue.1, pp.55-65, 2002.
DOI : 10.1155/2002/202839

D. Ridge, D. Becker, P. Merkey-et-t, and B. Sterling, Beowulf: harnessing the power of parallelism in a pile-of-PCs, 1997 IEEE Aerospace Conference, pp.79-91, 1997.
DOI : 10.1109/AERO.1997.577619

A. W. Roscoe, C. A. Hoare-et-r, and . Bird, The Theory and Practice of Concurrency, 1997.

J. Russell, Program slicing literature survey, 2001.

S. S. Shende and A. D. Malony, The Tau Parallel Performance System, International Journal of High Performance Computing Applications, vol.20, issue.2, pp.287-311, 2006.
DOI : 10.1177/1094342006064482

R. H. Saavedra-et-a and . Smith, Analysis of benchmark characteristics and benchmark performance prediction, rap. tech., EECS Department, 1992.

J. C. Sancho, A. Robles-et-j, and . Duato, Effective strategy to compute forwarding tables for infiniBand networks, International Conference on Parallel Processing, 2001., pp.48-57, 2001.
DOI : 10.1109/ICPP.2001.952046

V. Sarkar, Determining average program execution times and their variance, SIGPLAN Not, pp.298-312, 1989.

M. J. Schordan-et-d and . Quinlan, A source-to-source architecture for userdefined optimizations, JMLC, pp.214-223, 2003.

M. D. Schroeder, A. D. Birrell, M. Burrows, H. Murray, R. M. Need-ham et al., Autonet: a high-speed, self-configuring local area network using point-to-point links, IEEE Journal on Selected Areas in Communications, vol.9, issue.8, 1991.
DOI : 10.1109/49.105178

J. T. Schwartz and U. , Ultracomputers, ACM Transactions on Programming Languages and Systems, vol.2, issue.4, pp.484-521, 1980.
DOI : 10.1145/357114.357116

H. Shan, K. Antypas-et-j, and . Shalf, Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2008.
DOI : 10.1109/SC.2008.5222721

G. M. Shipman, T. S. Woodall, R. L. Graham, A. B. Maccabe-et-p, and . Bridges, Infiniband scalability in Open MPI, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006.
DOI : 10.1109/IPDPS.2006.1639335

T. Skeie, O. Lysne-et-i, and . Theiss, Layered shortest path (LASH) routing in irregular system area networks, Proceedings 16th International Parallel and Distributed Processing Symposium, p.162, 2002.
DOI : 10.1109/IPDPS.2002.1016559

D. B. Skillicorn, J. M. Hill-et-w, and . Mccoll, Questions and Answers about BSP, Scientific Programming, pp.249-274, 1997.
DOI : 10.1155/1997/532130

URL : http://doi.org/10.1155/1997/532130

Q. O. Snell, A. R. Mikler-et-j, and . Gustafson, Netpipe : A network protocol independent performance evaluator, IASTED International Conference on Intelligent Information Management and Systems, 1996.

M. R. Steed-et-m and . Clement, Performance prediction of PVM programs, Proceedings of International Conference on Parallel Processing, pp.803-807, 1996.
DOI : 10.1109/IPPS.1996.508151

S. Sur, M. J. Koop-et-d, and . Panda, MPI and communication---High-performance and scalable MPI over InfiniBand with reduced memory usage, Proceedings of the 2006 ACM/IEEE conference on Supercomputing , SC '06, p.105, 2006.
DOI : 10.1145/1188455.1188565

Z. Szebenyi, B. J. Wylie-et-f, and . Wolf, Scalasca Parallel Performance Analyses of PEPC, Proc. of the 1st Workshop on Productivity and Performance (PRO- PER) in conjunction with Euro-Par 2008, pp.305-314, 2009.
DOI : 10.1007/978-3-540-69814-2_8

F. Tip, A survey of program slicing techniques, Journal of Programming Languages, vol.3, pp.121-189, 1995.

P. D. Torre-et-c and . Kruskal, Submachine locality in the bulk synchronous setting (extended abstract), Euro-Par '96 : Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II, pp.352-358, 1996.

R. A. Towle, Control and data dependence for program transformations, Thèse doctorat, 1976.

D. Towsley, C. Rommel-et-j, and . Stankovic, Analysis of fork-join program response times on multiprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.286-303, 1990.
DOI : 10.1109/71.80157

T. K. Tsuei-et-m and . Vernon, Diagnosing parallel program speedup limitations using resource contention models, ICPP (1), B. W. Wah, éd, pp.185-189, 1990.

A. M. Turing, On computable numbers, with an application to the Entscheidungsproblem, Proceedings of the London Mathematical Society, pp.230-265, 1936.

J. Uniejewski, Spec benchmark suite : designed for today's advanced systems, Rap. tech. 1, SPEC Newsletter, 1989.

L. G. Valiant, A bridging model for parallel computation, Communications of the ACM, vol.33, issue.8, pp.33-103, 1990.
DOI : 10.1145/79173.79181

P. Velho-et-a and . Legrand, Accuracy study and improvement of network simulation in the SimGrid framework, Proceedings of the Second International ICST Conference on Simulation Tools and Techniques, 2009.
DOI : 10.4108/ICST.SIMUTOOLS2009.5592

J. Vienne and . Martinasso, Evaluation et modélisation des communications concurrentes, AEP09, 2008.

J. Vienne, M. Martinasso, and J. Méhaut, Predictive models for bandwidth sharing in high performance clusters, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00953618

A. Vishnu, A. R. Mamidala, H. K. Jin-et-d, and . Panda, Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM, 19th IEEE International Parallel and Distributed Processing Symposium, pp.19-296, 2005.
DOI : 10.1109/IPDPS.2005.339

D. F. Vrsalovic, D. P. Siewiorek, Z. Z. Segall-et-e, and . Gehringer, Performance prediction and calibration for a class of multiprocessors, IEEE Transactions on Computers, vol.37, issue.11, pp.1353-1365, 1988.
DOI : 10.1109/12.8701

V. S. Adve, Analyzing the Behavior and Performance of Parallel Programs, Thèse doctorat, 1993.

W. Gropp-and-e and . Lusk, User's Guide for MPE : Extensions for MPI Programs

H. Wabnig-et-g and . Haring, PAPS ??? A testbed for performance prediction of parallel applications, Parallel Computing, vol.22, issue.13, pp.1837-1851, 1997.
DOI : 10.1016/S0167-8191(96)00080-4

M. Warren, D. J. Becker, M. P. Goda, J. K. Salmon-et-t, and . Sterling, Parallel supercomputing with commodity components, International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA, pp.1372-1381, 1997.

M. Weiser, Program Slicing, IEEE Transactions on Software Engineering, vol.10, issue.4, pp.352-357, 1984.
DOI : 10.1109/TSE.1984.5010248

S. A. Williams, Programming models for parallel systems, 1990.

F. Wolf, B. J. Wylie, E. Ábrahám, D. Becker, W. Frings et al., Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications, Proc. of the 2nd HLRS Parallel Tools Workshop, pp.157-167, 2008.
DOI : 10.1007/978-3-540-68564-7_10

C. , Y. D. Et-m, and . Smith, Improving the accuracy of static branch prediction using branch correlation, ASPLOS-VI : Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, pp.232-241, 1994.

D. Zaparanuks, M. Jovic-et-m, and . Hauswirth, Accuracy of performance counter measurements, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.23-32, 2009.
DOI : 10.1109/ISPASS.2009.4919635