Heartbeat: A timeout-free failure detector for quiescent reliable communication, Marios Mavronicolas and Philippas Tsigas Proceedings of the 11th Workshop on Distributed Algorithms (WDAG'97, pp.126-140, 1997. ,
DOI : 10.1007/BFb0030680
VOMS, an Authorization System for Virtual Organizations, European Across Grids Conference, pp.33-40 ,
DOI : 10.1007/978-3-540-24689-3_5
A System for Public-Resource Computing and Storage, Proceedings of the 5th International Workshop on Grid Computing, pp.4-10, 2004. ,
Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology, Proceedings of the 5th International Symposium on Parallel and Distributed Processing and Applications, pp.471-482, 2007. ,
DOI : 10.1007/978-3-540-74742-0_43
Jelena Pjesivac-Grbovic and Jack Dongarra. « Scalable fault tolerant protocol for parallel runtime environments, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI Users' Group Meeting (Eu- roPVM/MPI'06), pp.141-149, 2006. ,
« A Scalable Failure Recovery Model for Tree-based Overlay Networks ,
Tree-based overlay networks for scalable applications, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006. ,
DOI : 10.1109/IPDPS.2006.1639493
« Monte Carlo Grid Application for Electron Transport, Proceedings of the 6th International Conference on Computational Science (ICCS'06), Part III, pp.616-623, 2006. ,
Madeleine II: a portable and efficient communication library for high-performance cluster computing, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000, pp.607-626, 2002. ,
DOI : 10.1109/CLUSTR.2000.889004
Dependability and Its Threats: A Taxonomy, Building the Information Society, IFIP 18th World Computer Congress, Topical Sessions, pp.22-27, 2004. ,
DOI : 10.1007/978-1-4020-8157-6_13
Basic concepts and taxonomy of dependable and secure computing, Basic Concepts and Taxonomy of Dependable and Secure Computing, pp.11-33, 2004. ,
DOI : 10.1109/TDSC.2004.2
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.219.5446
Integration of GRID Superscalar and GridWay Metascheduler with the DRMAA OGF Standard, Proceedings of the 14th European Conference on Parallel and Distributed Computing, pp.445-455, 2008. ,
DOI : 10.1007/978-3-540-85451-7_49
Running Parallel Applications with Topology-Aware Grid Middleware, 2009 Fifth IEEE International Conference on e-Science, 2009. ,
DOI : 10.1109/e-Science.2009.48
URL : https://hal.archives-ouvertes.fr/hal-00684522
NetSolve/D: A Massively Parallel Grid Execution System for Scalable Data Intensive Collaboration, 19th IEEE International Parallel and Distributed Processing Symposium, 2005. ,
DOI : 10.1109/IPDPS.2005.298
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations, International Journal of High Performance Computing and Networking, vol.1, issue.1/2/3, pp.91-99, 2004. ,
DOI : 10.1504/IJHPCN.2004.007569
Frédéric Magniette, Vincent Néri and Anton Selikhov. « MPICH-V : Toward a Scalable Fault Tolerant MPI for Volatile Nodes, High Performance Networking and Computing (SC2002), 2002. ,
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Gilles Fedak and Franck Cappello. « Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment, Proceedings of the 8th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'08), pp.475-483, 2008. ,
An Adaptive Scheduling Method for Grid Computing, Proceedings of the 12th European Conference on Parallel and Distributed Computing, pp.188-197, 2006. ,
DOI : 10.1007/11823285_20
Tolérance automatique aux défaillances par points de reprise et retour en arrière dans les systèmes hautes performances à passage de messages ». Doctorat en sciences, spécialité informatique, 2006. ,
Redesigning the message logging model for high performance, International Supercomputer Conference, 2008. ,
DOI : 10.1002/cpe.1589
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.3871
Pierre Lemarinier and Frédéric Magniette. « MPICH-V2 : a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging, High Performance Networking and Computing (SC2003). Phoenix USA, 2003. ,
Pierre Lemarinier and Franck Cappello. « Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI, Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), p.97, 2005. ,
Géraud Krawezik and Franck Cappello. « Coordinated checkpoint versus message log for fault tolerant MPI, IEEE International Conference on Cluster Computing, 2003. ,
Géraud Krawezik and Franck Cappello. « Coordinated checkpoint versus message log for fault tolerant MPI », International Journal of High Performance Computing and Networking (IJHPCN), issue.3, 2004. ,
Monte Carlo methods for matrix computations on the grid, Future Generation Computer Systems, vol.24, issue.6, pp.605-612, 2008. ,
DOI : 10.1016/j.future.2007.07.006
Efficient algorithms for all-to-all communications in multiport message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.11, pp.1143-1156, 1997. ,
DOI : 10.1109/71.642949
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols, Digital Object Identifier, pp.73-84, 2008. ,
DOI : 10.1016/j.future.2007.02.002
URL : https://hal.archives-ouvertes.fr/hal-00688644
« Design and Evaluation of Nemesis : a Scalable, Low-Latency, Message-Passing Communication Subsystem, Proceedings of the 6th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'06), 2006. ,
An Open Cluster Environment for MPI, Proceedings of Supercomputing Symposium, pp.379-386, 1994. ,
A Scalable Process-Management Environment for Parallel Programs, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 7th European PVM/MPI Users' Group Meeting (EuroPVM/MPI'02), pp.168-175, 2000. ,
DOI : 10.1007/3-540-45255-9_25
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.7667
A batch scheduler with high level components, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., pp.776-783, 2005. ,
DOI : 10.1109/CCGRID.2005.1558641
URL : https://hal.archives-ouvertes.fr/hal-00005106
Grid'5000: a large scale and highly reconfigurable grid experimental testbed, The 6th IEEE/ACM International Workshop on Grid Computing, 2005., pp.99-106, 2005. ,
DOI : 10.1109/GRID.2005.1542730
URL : https://hal.archives-ouvertes.fr/hal-00684943
HiHCoHP-Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, p.42, 2001. ,
DOI : 10.1109/IPDPS.2001.924978
Alexandre Di Costanzo and Mario Leyton. « ProActive : an Integrated platform for programming and running applications on grids and P2P systems, 2006. ,
NetSolve, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '96, pp.212-223, 2000. ,
DOI : 10.1145/369028.369111
The Open Run-Time Environment (OpenRTE): A Transparent Multi-cluster Environment for High-Performance Computing, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 12th European PVM/MPI Users' Group Meeting, pp.225-232, 2005. ,
DOI : 10.1007/11557265_31
Luca Petronzio and Francesco Prelz. « The gLite Workload Management System, Proceedings of the 4th Onternational Conference on Advances in Grid and Pervasive Computing, pp.256-268, 2009. ,
Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters, Lecture Notes in Computer Science, vol.3947, pp.175-186 ,
DOI : 10.1007/11745693_18
« Distributed Snapshots : Determining Global States of Distributed Systems, Transactions on Computer Systems, pp.63-75, 1985. ,
FATCOP: A Fault Tolerant Condor-PVM Mixed Integer Programming Solver, SIAM Journal on Optimization, vol.11, issue.4, 2001. ,
DOI : 10.1137/S1052623499353911
Fault tolerant high performance computing by a coding approach, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '05, pp.213-223, 2005. ,
DOI : 10.1145/1065944.1065973
« ScaLAPACK : A Portable Linear Algebra Library for Distributed Memory Computers -Design Issues and Performance, PARA, pp.95-106, 1995. ,
A proposal for a set of parallel basic linear algebra subprograms, pp.107-114, 1995. ,
DOI : 10.1007/3-540-60902-4_13
« TakTuk, adaptive deployment of remote executions, Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp.91-100, 2009. ,
Tarek El-Ghazawi, Ashrujit Mohanti and Yiyi Yao. « An Evaluation of Global Address Space Languages : Co-Array Fortran and Unified Parallel, 2005. ,
« Tolérance aux fautes par recouvrement arrière dans les systèmes informatiques répartis ». Doctorat en sciences, spécialité informatique, 1996. ,
« Silicon transistor hits 500GHz performance ». III-Vs Review, pp.30-31, 2006. ,
DOI : 10.1016/s0961-1290(06)71713-6
URL : http://doi.org/10.1016/s0961-1290(06)71713-6
MPI Applications on Grids: A Topology Aware Approach, 2008. ,
DOI : 10.1007/978-3-540-24685-5_1
URL : https://hal.archives-ouvertes.fr/inria-00319241
MPI Applications on Grids: A Topology Aware Approach, Proceedings of the 15th European Conference on Parallel and Distributed Computing (EuroPar'09), pp.466-477, 2009. ,
DOI : 10.1007/978-3-540-24685-5_1
URL : https://hal.archives-ouvertes.fr/inria-00319241
Adapted Version of the OpenMPI Communication Library, 2009. ,
Sylvain Peyronnet and Ala Rezmerita. « D1.2a : OpenMPI Communication Library, 2007. ,
Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI, ACM/IEEE SC 2006 Conference (SC'06), p.page electronic, 2006. ,
DOI : 10.1109/SC.2006.15
URL : https://hal.archives-ouvertes.fr/hal-00684891
Sylvain Peyronnet, Ala Rezmerita and Franck Cappello. « Grid Services For MPI, Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid'08), pp.417-424, 2008. ,
Adapted Version of the OpenMPI Communication Library, 2008. ,
Thomas Herault and Franck Cappello. « Grid Services For MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI Users' Group Meeting (EuroPVM/M- PI'07), pp.393-394, 2007. ,
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Transactions on Computers, vol.36, issue.5, pp.36547-553, 1987. ,
DOI : 10.1109/TC.1987.1676939
URL : http://authors.library.caltech.edu/26907/1/5206-TR-86.pdf
Adaptive loops with kaapi on multicore and grid, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.33-42, 2007. ,
DOI : 10.1145/1278177.1278185
« Communication-avoiding parallel and sequential QR factorizations, 2008. ,
DOI : 10.1137/080731992
URL : http://arxiv.org/abs/0808.2664
Wide-area communication for grids: an integrated solution to connectivity, performance and security problems, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004., pp.97-106, 2004. ,
DOI : 10.1109/HPDC.2004.1323501
URL : https://hal.archives-ouvertes.fr/inria-00000126
PadicoTM: an open integration framework for communication middleware and runtimes, Future Generation Computer Systems, vol.19, issue.4, pp.575-585, 2003. ,
DOI : 10.1016/S0167-739X(03)00034-7
URL : https://hal.archives-ouvertes.fr/inria-00000132
Self Stabilization, Journal of Aerospace Computing, Information, and Communication, vol.1, issue.6, 2000. ,
DOI : 10.2514/1.10141
URL : https://hal.archives-ouvertes.fr/inria-00627780
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
« FT-MPI : Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World, 2000. ,
« HARNESS fault tolerant MPI design, usage and performance issues, Future Generation Computer Systems, vol.18, issue.8, pp.1127-1142, 2002. ,
« Some Computer Organizations and Their Effectiveness, IEEE Trans. Comput, vol.21, issue.9, pp.948-960, 1972. ,
« Peer-to-Peer Communication Across Network Address Translators, USENIX Annual Technical Conference, General Track (USENIX '05), pp.179-192, 2006. ,
A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, Proceedings of the IEEE/ACM SC98 Conference, 1998. ,
DOI : 10.1109/SC.1998.10051
« The Nexus Task-parallel Runtime System, Proc. 1st Intl Workshop on Parallel Processing, pp.457-462, 1994. ,
« What is the Grid ? A Three Point Checklist, 2002. ,
Globus Toolkit Version 4: Software for Service-Oriented Systems, Journal of Computer Science and Technology, vol.10, issue.2, pp.513-520, 2006. ,
DOI : 10.1007/s11390-006-0513-y
URL : http://doi.org/10.1007/s11390-006-0513-y
Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.88-95, 1998. ,
DOI : 10.1109/PACT.1998.727176
A thread scheduling runtime system for data flow computations on cluster of multi-processors, Proceedings of the International Workshop on Parallel Symbolic Computing (PASCO'07), pp.15-23, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00684843
Load-balancing scatter operations for grid computing, Parallel Computing, vol.30, issue.8, pp.923-946, 2004. ,
DOI : 10.1016/j.parco.2004.07.005
URL : https://hal.archives-ouvertes.fr/hal-00807380
Fault Management in P2P-MPI, Proceedings of Advances in Grid and Pervasive Computing, Second International Conference, pp.64-77, 2007. ,
DOI : 10.1007/978-3-540-72360-8_6
URL : https://hal.archives-ouvertes.fr/inria-00529974
A tool for environment deployment in clusters and light grids, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006. ,
DOI : 10.1109/IPDPS.2006.1639691
URL : https://hal.archives-ouvertes.fr/hal-00688748
Matrix Computations, 1989. ,
High-performance implementation of the level-3 BLAS, ACM Transactions on Mathematical Software, vol.35, issue.1, 2008. ,
DOI : 10.1145/1377603.1377607
« The MPI communication library : its design and a portable implementation, Proceedings of the Scalable Parallel Libraries Conference, pp.160-165, 1993. ,
« MPICH working note : Creating a new MPICH device using the channel interface, 1995. ,
« Fault Tolerance in MPI Programs, 2004. ,
A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, vol.22, issue.6, pp.789-828, 1996. ,
DOI : 10.1016/0167-8191(96)00024-5
« Parallel Seismic Ray Tracing in a Global Earth Model », Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp.1151-1157, 2002. ,
« SOAP Version 1.2 Message Normalization ». World Wide Web Consortium, Note NOTE-soap12, 2003. ,
« The GridWay Framework for Adaptive Scheduling and Execution on Grids, Scalable Computing : Practice and Experience, pp.1-8, 2005. ,
Interconnect agnostic checkpoint/restart in open MPI, Proceedings of the 18th ACM international symposium on High performance distributed computing, HPDC '09, pp.49-58, 2009. ,
DOI : 10.1145/1551609.1551619
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.5837
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007. ,
DOI : 10.1109/IPDPS.2007.370605
A performance analysis of the Berkeley UPC compiler, Proceedings of the 17th annual international conference on Supercomputing , ICS '03, pp.63-73, 2003. ,
DOI : 10.1145/782814.782825
« The Design and Implementation of Berkeley Lab's Linux Checkpoint, 2003. ,
Simple Linux Utility for Resource Management, Proceedings of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, pp.44-60, 2003. ,
A Portable Concurrent Object Oriented System Based on C++, Proceedings of The International Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA'93), pp.91-108, 1993. ,
« Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance, 14th International Parallel and Distributed Processing Symposium (SPDP'2000), pp.377-386, 2000. ,
A Grid-Enabled Implementation of the Message Passing Interface, 2002. ,
Quasi-opportunistic Supercomputing in Grid Environments, Algorithms and Architectures for Parallel Processing, 8th International Conference Proceedings, volume 5022 of Lecture Notes in Computer Science, pp.233-244, 2008. ,
DOI : 10.1007/978-3-540-69501-1_24
Dynamic Grid Scheduling with Job Migration and Rescheduling in the GridLab Resource Management System, Scientific Programming, pp.263-273, 2004. ,
DOI : 10.1155/2004/892169
Second Prototype and Integration of Grid Services Together with QoS-Aware Grid MW Providers, 2008. ,
A Network Topology Description Model for Grid Application Deployment, Fifth IEEE/ACM International Workshop on Grid Computing, pp.61-68, 2004. ,
DOI : 10.1109/GRID.2004.2
URL : https://hal.archives-ouvertes.fr/inria-00070773
Enabling Grids for e-Science, 2008. ,
DOI : 10.1201/9781420067682-c3
Géraud Krawezik and Franck Cappello . « Improved Message logging versus Improved coordinated checkpointing for fault tolerant MPI, IEEE International Conference on Cluster Computing, 2004. ,
« Reduction Optimization in Heterogeneous Cluster Environments, IPPS : 14th International Parallel Processing Symposium, pp.477-482, 2000. ,
« Déploiement et contrôle d'applications parallèles sur grappes de grandes tailles, 2003. ,
The ganglia distributed monitoring system: design, implementation, and experience, Parallel Computing, vol.30, issue.7, pp.817-840, 2004. ,
DOI : 10.1016/j.parco.2004.04.001
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2889
Ryousei Takano and Yutaka Ishikawa. « TCP Adaptation for MPI on Long-and-Fat Networks, Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER'05), pp.1-10, 2005. ,
Collective communication in wormhole-routed massively parallel computers, Collective Communication in Wormhole- Routed Massively Parallel Computers, pp.39-50, 1995. ,
DOI : 10.1109/2.476198
A Remote Procedure Call API for Grid Computing, Grid computing ? GRID 2002 : third international workshop, 2002. ,
The design and implementation of a fault-tolerant RPC system: Ninf-C, Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004., pp.9-18, 2004. ,
DOI : 10.1109/HPCASIA.2004.1324011
Contribution á la conception de supports exécutifs multithreads performants ». Habilitation á diriger des recherches, 2001. ,
Can cloud computing reach the top500?, Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, UCHPC-MAW '09, 2009. ,
DOI : 10.1145/1531666.1531671
Kenneth Roche and Sathish Vadhiyar. « Numerical Libraries And The Grid : The GrADS Experiments With ScaLA- PACK », 2001. ,
Automatic and Adaptive Optimizations of MPI Collective Operations ». Doctorat en sciences, spécialité informatique, 2007. ,
DOI : 10.1109/ipdps.2005.335
Performance Analysis of MPI Collective Operations, 19th IEEE International Parallel and Distributed Processing Symposium, p.272, 2005. ,
DOI : 10.1109/IPDPS.2005.335
Diskless checkpointing, Diskless Checkpointing, pp.972-986, 1998. ,
DOI : 10.1109/71.730527
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.4662
Distributed Orthogonal Factorization: Givens and Householder Algorithms, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.6, pp.1113-1134, 1989. ,
DOI : 10.1137/0910067
« Failure Mode Assumptions and Assumption Coverage, FTCS, pp.386-395, 1992. ,
DOI : 10.1007/978-3-642-79789-7_8
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.5363
« Automatic MPI Counter Profiling of All Users : First Results on a CRAY T3E, pp.900-512, 1999. ,
Optimization of Collective Reduction Operations, Proceedings of the 4th International Conference on Computational Science, pp.1-9, 2004. ,
DOI : 10.1007/978-3-540-24685-5_1
System structure for software fault tolerance, Proceedings of the international conference on Reliable software, pp.437-449, 1975. ,
DOI : 10.1007/978-1-4612-6315-9_26
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.578
Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers, Proceedings of the 13th IEEE International Conference on High Performance Computing, pp.242-253, 2006. ,
Reliability challenges in large systems, Future Generation Computer Systems, vol.22, issue.3, pp.293-302, 2006. ,
DOI : 10.1016/j.future.2004.11.015
Private Virtual Cluster: Infrastructure and Protocol for Instant Grids, Private Virtual Cluster : Infrastructure and Protocol for Instant Grids Proceedings of the 12th European Conference on Parallel and Distributed Computing, pp.393-404, 2006. ,
DOI : 10.1007/11823285_41
« MRNet : A Software-Based Multicast/- Reduction Network for Scalable Tools, Proceedings of the International Conference for High Performance Networking Computing, Networking, Storage and Analysis (SC|03), 2003. ,
« Fail Stop Processors : An Approach to Designing Fault-Tolerant Computing Systems, ACM Transactions on Computer Systems, vol.1, pp.222-238, 1983. ,
« Defending Against Sequence Number Attacks, AT1T Research, 1948. ,
« Optimistic Recovery in Distributed Systems, Transactions on Computer Systems, pp.204-226, 1985. ,
« RPC : Remote Procedure Call, Protocol Specification, RFC, vol.1057, issue.2, 1988. ,
DOI : 10.17487/rfc1050
PVM: A framework for parallel distributed computing, Concurrency : Practice and Experience, pp.315-339, 1990. ,
DOI : 10.1002/cpe.4330020404
Fumihiro Okazaki and Yutaka Ishikawa. « Effects of packet pacing for MPI programs in a Grid environment, CLUSTER, pp.382-391, 2007. ,
« Ninf-G : A Reference Implementation of RPC-based Programming Middleware for Grid Computing », Journal of Grid Computing, vol.1, issue.1, pp.41-51, 2003. ,
DOI : 10.1023/A:1024083511032
Design, Implementation and Performance Evaluation of GridRPC Programming Middleware for a Large-Scale Computational Grid, Fifth IEEE/ACM International Workshop on Grid Computing, 2004. ,
DOI : 10.1109/GRID.2004.20
Condor ? A Distributed Job Scheduler, Beowulf Cluster Computing with Linux, 2001. ,
Condor and the Grid, Grid Computing : Making the Global Infrastructure a Reality, 2002. ,
DOI : 10.1002/0470867167.ch11
Improving the Performance of Collective Operations in MPICH, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI Users' Group Meeting (EuroPVM/MPI'03), pp.257-267, 2003. ,
DOI : 10.1007/978-3-540-39924-7_38
Standardization of an API for Distributed Resource Management Systems, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), pp.619-626, 2007. ,
DOI : 10.1109/CCGRID.2007.109
« Automated Empirical Optimization of Software and the ATLAS Project, Parallel Comput, vol.27, issue.12, pp.3-25, 2001. ,
The Evolution of Network Enabled Solver, Grid-Based Problem Solving Environments : IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments, pp.215-226, 2006. ,
« Scheduling divisible loads in the dynamic heterogeneous grid environment, publisher = ACM, editor = Xiaohua Jia, year =, Proceedings of the 1st International Conference on Scalable Information Systems (Infoscale 2006) series = ACM International Conference Proceeding Series, volume = 152, 2006. ,
Le premier de ces services concerne le cycle de vie de l'application, par son déploiement, son lancement et sa terminaison, et, durant l'exécution, la surveillance de son état et le comportement à suivre en cas de défaillance. L'autre service rendu par l'environnement d'exécution consiste à mettre en relation les processus de l'application pour leur permettre de communiquer en utilisant la bibliothèque de communications. On peut alors décomposer ses fonctionnalités en trois catégories : le déploiement et le lancement de l'application, les communications internes à l'environnement d'exécution (collectives et point-à-point) ,
échelle de l'environnement d'exécution lui-même, à travers les performances de ses fonctionnalités principales : le lancement d'applications, et les communications internes. Les défaillances étant inévitables dans un système à grande échelle, j'ai ensuite étudié des mécanismes de tolérance aux pannes ,
type particulier de systèmes à grande échelle avec les grilles de calcul formées par agrégation de grappes, en proposant un environnement de communications MPI adapté aux communications sur grilles en termes d'impératifs de sécurité et reposant sur un environnement d'exécution ,