Apprentissage statistique pour l'intégration de données omiques

Abstract : The development of high-throughput sequencing technologies has lead to produce high dimensional heterogeneous datasets at different living scales. To process such data, integrative methods have been shown to be relevant, but still remain challenging. This thesis gathers methodological contributions useful to simultaneously explore heterogeneous multi-omics datasets. To tackle this problem, kernels and kernel methods represent a natural framework because they allow to handle the own nature of each datasets while permitting their combination. However, when the number of sample to process is high, kernel methods suffer from several drawbacks: their complexity is increased and the interpretability of the model is lost. A first part of my work is focused on the adaptation of two exploratory kernel methods: the principal component analysis (K-PCA) and the self-organizing map (K-SOM). The proposed adaptations first address the scaling problem of both K-SOM and K-PCA to omics datasets and second improve the interpretability of the models. In a second part, I was interested in multiple kernel learning to combine multiple omics datasets. The proposed methods efficiency is highlighted in the domain of microbial ecology: eight TARA oceans datasets are integrated and analysed using a K-PCA.
Complete list of metadatas

Cited literature [188 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01666744
Contributor : Jérôme Mariette <>
Submitted on : Wednesday, December 20, 2017 - 9:44:14 AM
Last modification on : Friday, December 7, 2018 - 1:15:45 AM

File

Mariette_Jerome.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01666744, version 2

Citation

Jérôme Mariette. Apprentissage statistique pour l'intégration de données omiques. Bio-informatique [q-bio.QM]. UPS Toulouse - Université Toulouse 3 Paul Sabatier, 2017. Français. ⟨tel-01666744v2⟩

Share

Metrics

Record views

292

Files downloads

433