Skip to Main content Skip to Navigation
Theses

Concevoir et partager des workflows d’analyse de données : application aux traitements intensifs en bioinformatique

François Moreews 1, 2
2 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : As part of an Open Science initiative, we are particularly interested in the scientific Workflow Management Systems (WfMS) and their applications for intensive data analysis in bioinformatics. We start from the assumption that WfMS can evolve to become efficient hubs able to speed up the development and the dissemination of innovative analysis methods. These software platforms could rally and unite not only the current stakeholders, who are service consumers, but also the service producers, around a disciplinary theme. We therefore consider that these environments must be both adapted to the practices of the scientists who are method designers and also enhanced with increased productivity during design and treatment. These constraints lead us to study the rapid capture of workflows, the simplification of technical tasks integration, like parallelisation and the deployment customization. First, we define an expressive graphic worfklow language, adapted to the quick capture of workflows. This is interpreted by a workflow engine based on a new model of computation with high performances obtained by the use of multiple levels of parallelism. Then, we present a Model-Driven design approach that facilitates the data parallelism generation and the production of suitable implementations for different execution contexts. We describe in particular the integration of a components and platforms meta-model used to automate the configuration of workflows’ dependencies. Finally, in the case of the cloud model Container as a Service (CaaS), we develop a workflow specification intrinsically re-executable and readily disseminatable. The adoption of this kind of model could lead to an acceleration of exchanges and a better availability of data analysis workflows.
Document type :
Theses
Complete list of metadata

Cited literature [154 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01308297
Contributor : Abes Star :  Contact
Submitted on : Wednesday, April 27, 2016 - 3:02:27 PM
Last modification on : Tuesday, October 19, 2021 - 11:58:57 PM
Long-term archiving on: : Thursday, July 28, 2016 - 10:50:14 AM

File

MOREEWS_Francois.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01308297, version 2

Citation

François Moreews. Concevoir et partager des workflows d’analyse de données : application aux traitements intensifs en bioinformatique. Bio-informatique [q-bio.QM]. Université Rennes 1, 2015. Français. ⟨NNT : 2015REN1S089⟩. ⟨tel-01308297v2⟩

Share

Metrics

Record views

664

Files downloads

4402