Classification et inférence de réseaux pour les données RNA-seq

Abstract : This thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01424124
Contributor : Abes Star <>
Submitted on : Monday, January 2, 2017 - 1:19:27 AM
Last modification on : Friday, May 17, 2019 - 10:54:35 AM
Long-term archiving on : Monday, April 3, 2017 - 8:27:50 PM

File

73364_GALLOPIN_2015_diffusion....
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01424124, version 1

Collections

Citation

Mélina Gallopin. Classification et inférence de réseaux pour les données RNA-seq. Statistiques [math.ST]. Université Paris-Saclay, 2015. Français. ⟨NNT : 2015SACLS174⟩. ⟨tel-01424124⟩

Share

Metrics

Record views

607

Files downloads

1455