Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking

Sarah Cohen-Boulakia

Hdr Année : 2015

Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking

(1, 2, 3)

1
2
3

Sarah Cohen-Boulakia

Fonction : Auteur
PersonId : 15627
IdHAL : sarah-cohen-boulakia
ORCID : 0000-0002-7439-1441
IdRef : 09578960X

Scientific Data Management

Modeling plant morphogenesis at different scales, from genes to phenotype

Laboratoire de Recherche en Informatique

Résumé

Biological research is a science which derives its findings from the proper analysis of experiments. Today, a large variety of experiments are carried-out in hundreds of labs around the world, and their results are reported in a myriad of different databases, web-sites, publications etc., using different formats, conventions, and schemas. Providing a uniform access to these diverse and distributed databases is the aim of data integration solutions, which have been designed and implemented within the bioinformatics community for more than 20 years. However, the perception of the problem of data integration research in the life sciences has changed: While early approaches concentrated on handling schema-dependent queries over heterogeneous and distributed databases, current research emphasizes instances rather than schemas, tries to place the human back into the loop, and intertwines data integration and data analysis. Transparency -- providing users with the illusion that they are using a centralized database and thus completely hiding the original databases -- was one of the main goals of federated databases. It is not a target anymore. Instead, users want to know exactly which data from which source was used in which way in studies (Provenance). The old model of "first integrate, then analyze" is replaced by a new, process-oriented paradigm: "integration is analysis - and analysis is integration". This paradigm change gives rise to some important research trends. First, the process of integration itself, i.e., the integration workflow, is becoming a research topic in its own. Scientific workflows actually implement the paradigm "integration is analysis". A second trend is the growing importance of sensible ranking, because data sets grow and grow and it becomes increasingly difficult for the biologist user to distinguish relevant data from large and noisy data sets. This HDR thesis outlines my contributions to the field of data integration in the life sciences. More precisely, my work takes place in the first two contexts mentioned above, namely, scientific workflows and biological data ranking. The reported results were obtained from 2005 to late 2014, first as a postdoctoral fellow at the Uniersity of Pennsylvania (Dec 2005 to Aug 2007) and then as an Associate Professor at Université Paris-Sud (LRI, UMR CNRS 8623, Bioinformactics team) and Inria (Saclay-Ile-de-France, AMIB team 2009-2014).

Mots clés

Integration of biological data Scientific workflows provenance data ranking

Integration de données biologiques

Domaines

Bio-informatique [q-bio.QM] Algorithme et structure de données [cs.DS] Base de données [cs.DB] Recherche d'information [cs.IR]

Fichier principal

cohenboulakiaHDR.pdf (4.69 Mo)

Sarah Cohen-Boulakia : Connectez-vous pour contacter le contributeur

https://hal.science/tel-01245229

Soumis le : mercredi 16 décembre 2015-21:38:01

Dernière modification le : vendredi 9 février 2024-03:25:24

Archivage à long terme le : jeudi 17 mars 2016-17:00:35

Dates et versions

tel-01245229 , version 1 (16-12-2015)

Licence

Paternité

Identifiants

HAL Id : tel-01245229 , version 1

Citer

Sarah Cohen-Boulakia. Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking. Bioinformatics [q-bio.QM]. Université Paris-Sud, 2015. ⟨tel-01245229⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD CNRS INRIA INRA UMR8623 ZENITH LIRMM CENTRALESUPELEC AGROPOLIS INRIA2 LRI-BIOINFO UNIV-PARIS-SACLAY MIPS UNIV-MONTPELLIER INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER AGAP

991 Consultations

254 Téléchargements

Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager