Concevoir et partager des workflows d’analyse de données. Application aux traitements intensifs en bioinformatique

Abstract : Design and share data analysis workflows. Application to bioinformatics intensive treatments As part of an Open Science initiative, we are particularly interested in the scientific Workflow Management Systems (WfMS) and their applications for intensive data analysis in bioinformatics. We start from the assumption that WfMS can evolve to become efficient hubs able to speed up the development and the dissemination of innovative analysis methods. These software platforms could rally and unite not only the current stakeholders, who are service consumers, but also the service producers, around a disciplinary theme. We therefore consider that these environments must be both adapted to the practices of the scientists who are method designers and also enhanced with increased productivity during design and treatment. These constraints lead us to study the rapid capture of workflows, the simplification of technical tasks integration, like parallelisation and the deployment customization. First, we define an expressive graphic worfklow language, adapted to the quick capture of workflows. This is interpreted by a workflow engine based on a new model of computation with high performances obtained by the use of multiple levels of parallelism. Then, we present a Model-Driven design approach that facilitates the data parallelism generation and the production of suitable implementations for different execution contexts. We describe in particular the integration of a components and platforms meta-model used to automate the configuration of workflows’ dependencies. Finally, in the case of the cloud model Container as a Service (CaaS), we develop a workflow specification intrinsically re-executable and readily disseminatable. The adoption of this kind of model could lead to an acceleration of exchanges and a better availability of data analysis workflows.
Document type :
Theses
Complete list of metadatas

Cited literature [154 references]  Display  Hide  Download

https://hal.inria.fr/tel-01233191
Contributor : Francois Moreews <>
Submitted on : Tuesday, November 24, 2015 - 4:10:37 PM
Last modification on : Friday, January 11, 2019 - 2:28:06 PM
Long-term archiving on : Saturday, April 29, 2017 - 1:51:45 AM

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : tel-01233191, version 1

Citation

Francois Moreews. Concevoir et partager des workflows d’analyse de données. Application aux traitements intensifs en bioinformatique. Bio-informatique [q-bio.QM]. université de rennes 1, 2015. Français. ⟨tel-01233191⟩

Share

Metrics

Record views

651

Files downloads

1307