R. Gupta, P. Beckman, B. Park, E. Lusk, P. Hargrove et al., CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems, 2009 International Conference on Parallel Processing, pp.237-245, 2009.
DOI : 10.1109/ICPP.2009.20

K. Plankensteiner, R. Prodan, T. Fahringer, A. Kertesz, and P. Kacsuk, Fault-tolerant behavior in state-of-the-art Grid Workflow Management Systems, 2003.

W. Zang, M. Yu, and P. Liu, A Distributed Algorithm for Workflow Recovery, International Journal of Intelligent Control and Systems, vol.3, issue.79, p.118, 2007.

W. M. Van-der-aalst, M. Adams, A. H. Ter-hofstede, M. Pesic, and H. Schonenberg, Flexibility as a Service, 2008.
DOI : 10.1007/978-3-642-04205-8_27

M. Adams, A. H. Ter-hofstede, W. M. Van-der-aalst, and D. Edmond, Dynamic, Extensible and Context-Aware Exception Handling for Workows, Proceedings of the 15th International Conference on Cooperative Information Systems, 2003.

P. Compton, G. Edwards, B. Kang, L. Lazarus, R. Malor et al., Ripple Down Rules: Possibilities and Limitations, Handbook of Research on P2P and Grid Systems for Service-Oriented Computing: Models, Methodologies and Applications, p.118, 1991.

D. C. Marinescu, G. M. Marinescu, Y. Ji, L. Boloni, and H. J. Siegel, Ad hoc grids: communication and computing in a power constrained environment, Conference Proceedings of the 2003 IEEE International, pp.113-122, 2003.
DOI : 10.1109/PCCC.2003.1203690

L. W. Mcknight, J. Howison, and S. Bradner, Guest Editors' Introduction: Wireless Grids--Distributed Resource Sharing by Mobile, Nomadic, and Fixed Devices, IEEE Internet Computing, vol.8, issue.4, pp.24-31, 2004.
DOI : 10.1109/MIC.2004.14

W. Li, Z. Tommy, B. Xu, Y. Li, and . Gong, The Vega Personal Grid: A Lightweight Grid Architecture, p.11

R. Desmarais and H. Muller, A Proposal for an Autonomic Grid Management System, International Workshop on Software Engineering for Adaptive and Self-Managing Systems (SEAMS '07), p.11, 2007.
DOI : 10.1109/SEAMS.2007.1

C. Hewitt, ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing, IEEE Internet Computing, vol.12, issue.5, pp.96-99, 2008.
DOI : 10.1109/MIC.2008.107

K. Krauter, R. Buyya, and M. Maheswaran, A Taxonomy and Survey of Grid Resource Management Systems. Software Practice and Experience, pp.135-164, 2002.

B. Barney, Message Passing Interface, 2012.

F. Cappello, E. Caron, M. Dayde, F. Desprez, E. Jeannot et al., Grid'5000: a large scale, reconfigurable, controlable and monitorable Grid platform, Grid'2005 Workshop, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000284

M. C. Cera, Y. Georgiou, O. Richard, N. Maillard, and P. O. Navaux, Supporting MPI Malleable Applications upon the OAR Resource Manager, p.16, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00691414

E. Jeanvoine, L. Sarzyniec, and L. Nussbaum, Kade- ploy3: Efficient and Scalable Operating System Provisioning for HPC Clusters, p.16, 2012.

I. A. Foster, R. Amar, A. Bolze, A. Bouteiller, Y. Chis et al., Software for Service-Oriented Systems DIET: New Developments and Recent Results, IFIP International Conference on Network and Parallel Computing , number 3779 in LNCS Laboratoire de l'Informatique du Parallélisme (LIP), pp.2-13, 1920.

M. Treaster, A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems, ACM Computing Research Repository (CoRR), vol.501002, pp.1-11, 2005.

B. Schroeder and G. A. Gibson, Understanding failures in petascale computers, Journal of Physics: Conference Series, vol.78, issue.1, p.27, 2007.
DOI : 10.1088/1742-6596/78/1/012022

X. Besseron, Tolérance aux fautes et reconfiguration dynamique pour les applications distribuéesdistribuées`distribuéesà grandégrandé echelle, These, Institut National Polytechnique de Grenoble -INPG, 1927.

E. Heien, D. Kondo, A. Gainaru, D. Lapine, B. Kramer et al., Modeling and tolerating heterogeneous failures in large parallel systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.451-4511, 2011.
DOI : 10.1145/2063384.2063444

L. Zhiling and L. Yawei, Adaptive Fault Management of Parallel Applications for High-Performance Computing, IEEE Transactions on Computers, vol.57, issue.12, pp.1647-1660, 2008.
DOI : 10.1109/TC.2008.90

L. Yawei and L. Zhiling, Exploit failure prediction for adaptive fault-tolerance in cluster computing, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), pp.8-538, 1928.
DOI : 10.1109/CCGRID.2006.45

J. High-perform, ]. C. Zizhong, and J. Dongarra, Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources [40] F. Cappello. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities, Parallel and Distributed Processing Symposium, pp.374-388, 2006.

A. Guermouche, T. Ropars, E. Brunet, M. Snir, and F. Cappello, Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.989-1000, 2011.
DOI : 10.1109/IPDPS.2011.95

URL : https://hal.archives-ouvertes.fr/hal-01121937

A. Oliner, L. Rudolph, and R. Sahoo, Cooperative checkpointing theory, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pp.10-32, 2006.
DOI : 10.1109/IPDPS.2006.1639368

N. Naksinehaboon, Y. Liuand, C. Leangsuksun, R. Nassar, M. Paun et al., Reliability- Aware Approach: An Incremental Checkpoint

X. Besseron, S. Jafar, T. Gautier, and J. Roch, CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in Kaapi, 2006 2nd International Conference on Information & Communication Technologies, pp.3353-3358, 2006.
DOI : 10.1109/ICTTA.2006.1684955

URL : https://hal.archives-ouvertes.fr/hal-00684864

A. Bouteiller, P. Lemarinier, K. Krawezik, and F. Capello, Coordinated checkpoint versus message log for fault tolerant MPI, Proceedings IEEE International Conference on Cluster Computing CLUSTR-03, pp.242-250, 2003.
DOI : 10.1109/CLUSTR.2003.1253321

L. Bautista-gomez, S. Tsuboi, D. Komatitsch, F. Cappello, N. Maruyama et al., FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3232, 2011.
DOI : 10.1145/2063384.2063427

URL : https://hal.archives-ouvertes.fr/hal-00721216

J. Bent, B. Mcclelland, G. Gibson, P. Nowoczynski, G. Grider et al., PLFS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.33, 2009.
DOI : 10.1145/1654059.1654081

J. S. Plank, K. Li, and M. A. Puening, Diskless checkpointing, IEEE Transactions on Parallel and Distributed Systems, vol.9, issue.10, pp.972-986, 1998.
DOI : 10.1109/71.730527

B. Nicolae and F. Cappello, BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011.
DOI : 10.1145/2063384.2063429

URL : https://hal.archives-ouvertes.fr/inria-00601865

T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger et al., Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, vol.20, issue.17, pp.203045-3054, 2004.
DOI : 10.1093/bioinformatics/bth361

P. Fisher, H. Noyes, S. Kemp, R. Stevens, and A. Brass, A Systematic Strategy for the Discovery of Candidate Genes Responsible for Phenotypic Variation, Cardiovascular Genomics, pp.329-345, 2009.
DOI : 10.1007/978-1-60761-247-6_18

M. Caeiro-rodriguez, T. Priol, and Z. Nemeth, Dynamicity in Scientific Workflows, Institute on Grid Information, Resource and Workflow Monitoring Services, pp.38-43, 2008.

M. Ghanem, N. Azam, M. Boniface, and J. Ferris, Grid-Enabled Workflows for Industrial Product Design, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), pp.96-130, 2006.
DOI : 10.1109/E-SCIENCE.2006.261180

M. Vasko and S. Dustdar, A view based analysis of workflow modeling languages, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06), pp.293-300, 2006.
DOI : 10.1109/PDP.2006.17

E. Deelman and Y. Gil, ManagingLargeScale- ScientificWorkflowsInDistributedEnvironmentsExperiencesAndChallenges In e-Science and Grid Computing, 2006. e-Science '06, Second IEEE International Conference on, pp.144-178, 2006.

E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves et al., Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), pp.4-6, 2006.
DOI : 10.1109/E-SCIENCE.2006.261098

W. Jianwu, I. Altintas, C. Berkley, L. Gilbert, and M. B. Jones, A High-Level Distributed Execution Framework for Scientific Workflows, IEEE Fourth International Conference on eScience, pp.8-34, 2008.

J. Montagnat, D. Lingrand, and X. Pennec, Flexible and efficient workflow deployement of data-intensive applications on grids with MOTEUR. In GRIDS with MOTEUR, in quot, To appear in the special issue on Workflow Systems in Grid Environments, pp.3-34, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00459130

Z. Zhao, A. Belloum, H. Yakali, P. Sloot, and B. Hertzberger, Dynamic Workflow in a Grid Enabled Problem Solving Environment, CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology, p.36, 2005.

S. Shankar and D. J. Dewitt, Data driven workflow planning in cluster management systems, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, pp.127-136, 2007.
DOI : 10.1145/1272366.1272383

M. Shields, Control- Versus Data-Driven Workflows, Science, pp.167-173, 2007.
DOI : 10.1007/978-1-84628-757-2_11

M. Adams, A. H. Ter, D. Hofstede, W. M. Edmond, and . Van-der-aalst, Implementing Dynamic Flexibility in Workflows using Worklets, p.53, 2006.

D. Abramson, C. Enticott, and I. Altinas, Nimrod/K: Towards massively parallel dynamic Grid workflows, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008.
DOI : 10.1109/SC.2008.5215726

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

K. Plankensteiner, R. Prodan, and T. Fahringer, A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact, 2009 Fifth IEEE International Conference on e-Science, p.44, 2009.
DOI : 10.1109/e-Science.2009.51

J. Yu and R. Buyya, A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, vol.15, issue.5???6, p.49, 2005.
DOI : 10.1007/s10723-005-9010-8

G. Kandaswamy, A. Mandal, D. A. Reed, and . Gautier, Fault Tolerance and Recovery of Scientific Workflows on Computational Grids Optimised Recovery with a Coordinated Checkpoint/Rollback Protocol for Domain Decomposition Applications, 8th IEEE International Symposium on Cluster Computing and the Grid Modelling, Computation and Optimization in Information Systems and Management Sciences, 14 of Communications in Computer and Information Science, pp.497-506, 2008.

L. Zongwei, Checkpointing for workflow recovery, Proceedings of the 38th annual on Southeast regional conference, pp.79-80, 2000.

C. Buligon, S. Cechin, and I. Jansch-pôrto, Implementing Rollback-Recovery Coordinated Checkpoints, Proceedings of the 5th international conference on Advanced Distributed Systems, ISSADS'05, pp.246-257, 2005.
DOI : 10.1007/11533962_22

L. Ramakrishnan, C. Koelbel, Y. Kee, R. Wolski, D. Nurmi et al., VGrADS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-4712, 2009.
DOI : 10.1145/1654059.1654107

W. Bland, P. Du, A. Bouteiller, T. Herault, G. Bosilca et al., A checkpoint-onfailure protocol for algorithm-based recovery in standard MPI, Proceedings of the 18th international conference on Parallel Processing, Euro- Par'12, pp.477-488, 2012.

T. Fahringer, R. Prodan, D. Rubing, F. Nerieri, S. Podlipnig et al., ASKALON: a Grid application development and computing environment, The 6th IEEE/ACM International Workshop on Grid Computing, 2005., pp.10-48, 2005.
DOI : 10.1109/GRID.2005.1542733

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

R. Duan, R. Prodan, and T. Fahringer, DEE: A Distributed Fault Tolerant Workflow Enactment Engine for Grid Computing, High Performance Computing and Communications, 3726 of Lecture Notes in Computer Science, pp.704-716
DOI : 10.1007/11557654_81

I. Altintas, A. Birnbaum, K. K. Baldridge, W. Sudholt, M. Miller et al., A Framework for the Design and Reuse of Grid Workflows, International Workshop on Scientific Aspects of Grid Computing, pp.120-133, 2005.
DOI : 10.1007/11423287_11

D. Crawl and I. Altintas, A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows Workflow Patterns: On the Expressive Power of (Petri-net-based) Workflow Languages, Intl. Provenance and Annotation Workshop (IPAW), pp.1-20, 2002.

M. Adams, A. H. Ter, W. M. Hofstede, N. Van-der-aalst, and . Russell, Modern Business Process Automation, pp.2010-53

T. Nguyên, L. Trifan, and J. Désidéri, A Distributed Workflow Platform for Simulation, Proceedings of the 4th International Conference on Advanced Engineering Computing and Applications in Sciences, p.57, 2010.

E. Sindrilaru, A. Costan, and V. Cristea, Fault Tolerance and Recovery in Grid Workflow Management Systems, 2010 International Conference on Complex, Intelligent and Software Intensive Systems, pp.475-480, 2010.
DOI : 10.1109/CISIS.2010.113

M. Yu, P. Liu, and W. Zang, The implementation and evaluation of a recovery system for workflows, Journal of Network and Computer Applications, vol.32, issue.1, pp.158-183, 2009.
DOI : 10.1016/j.jnca.2008.03.007

R. Le-riche, D. Caromel, and R. Duvigneau, Optimization tools and applications developed during the OMD & OMD2 projects, Forum Teratech 2011, Complex systems engineering workshop (atelier ingénierie des systèmes complexes), p.67, 2011.
URL : https://hal.archives-ouvertes.fr/emse-00686596

A. Zerbinati, J. Désidéri, and R. Duvigneau, Application of Metamodel-Assisted Multiple- Gradient Descent Algorithm (MGDA) to Air- Cooling Duct Shape Optimization, ECCO- MAS -European Congress on Computational Methods in Applied Sciences and Engineering -2012, p.67, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00742948

J. Eder and W. Liebhart, Workflow recovery, Proceedings First IFCIS International Conference on Cooperative Information Systems, pp.124-134, 1996.
DOI : 10.1109/COOPIS.1996.555004

M. P. Kumar, K. Nayong, L. Andre, K. Joohyun, and J. Shantenu, Understanding mapreduce-based next-generation sequencing alignment on distributed cyberinfrastructure, Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences, ECMLS '12, pp.3-12, 2012.

D. Jeffrey and G. Sanjay, MapReduce: simplified data processing on large clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008.