HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Classification et inférence de réseaux pour les données RNA-seq

Abstract : This thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model.
Complete list of metadata

Contributor : Abes Star :  Contact
Submitted on : Monday, January 2, 2017 - 1:19:27 AM
Last modification on : Wednesday, April 20, 2022 - 3:37:36 AM
Long-term archiving on: : Monday, April 3, 2017 - 8:27:50 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01424124, version 1


Mélina Gallopin. Classification et inférence de réseaux pour les données RNA-seq. Statistiques [math.ST]. Université Paris-Saclay, 2015. Français. ⟨NNT : 2015SACLS174⟩. ⟨tel-01424124⟩



Record views


Files downloads