Skip to Main content Skip to Navigation

Resiliency in Distributed Workflow Systems for Numerical Applications

Laurentiu Trifan 1
1 OPALE - Optimization and control, numerical algorithms and integration of complex multidiscipline systems governed by PDE
CRISAM - Inria Sophia Antipolis - Méditerranée , JAD - Laboratoire Jean Alexandre Dieudonné : UMR6621
Abstract : This thesis aims to conceive an environment for high-performance computing dedicated to numerical optimization applications.The design and optimisation tools belong to different academic and industrial teams that collaborate inside same projects. these tools must befederated in a cmmon environmentin orer to facilitate the access to researchers and engineers. The environment that we propose, to answer the above mentioned conditions, is composed of a workflow system and a distributed coputing system. The first goal is to facilitate the application design, while the second is is in charge of the application execution on distributed resources. Of course, a set of communication services between the two systems must be developed. The computation phase must be acheived efficiently with regards to the parallelism of some codes,synchronous and asynchroneousexection of tasks, data transfer and the available hardware and software resources. Moreover, the environment must provide a sufficient level of fault-tolerance, whether from hardware and software failures, to minimize their influence on the final result or the computation time; An important condition is to implement solutions for restarting the application afeter anerror occurs such that the time for treating the error remains inferior to the time needed to completely restrat the application. In our case, we focused on the Yawl workflow system, since it presents good characteristics in terms of i) hardware and software independence, ii) fault-tolerant mechanisms. Regarding the distributed execution, our tests were deployed on the Grid5000 platform, using up to 64 different machines located on 5 geographic sites. This document presents the design choices and the extensions performed on Yawl in order to run on a distributed platform.
Document type :
Complete list of metadata

Cited literature [67 references]  Display  Hide  Download
Contributor : Toan Nguyen <>
Submitted on : Monday, December 2, 2013 - 11:10:04 AM
Last modification on : Wednesday, October 14, 2020 - 4:24:29 AM
Long-term archiving on: : Monday, March 3, 2014 - 9:05:33 AM


  • HAL Id : tel-00912491, version 1


Laurentiu Trifan. Resiliency in Distributed Workflow Systems for Numerical Applications. Performance [cs.PF]. Université de Grenoble, 2013. English. ⟨tel-00912491⟩



Record views


Files downloads