Skip to Main content Skip to Navigation

Statistical Approaches for Segmentation : Application to Genome Annotation

Abstract : We propose to model the output of transcriptome sequencing technologies (RNA-Seq) using the negative binomial distribution, as well as build segmentation models suited to their study at different biological scales, in the context of these technologies becoming a valuable tool for genome annotation, gene expression analysis, and new-transcript discovery. We develop a fast segmentation algorithm to analyze whole chromosomes series, and we propose two methods for estimating the number of segments, a key feature related to the number of genes expressed in the cell, should they be identified from previous experiments or discovered at this occasion. Research on precise gene annotation, and in particular comparison of transcription boundaries for individuals, naturally leads us to the statistical comparison of change-points in independent series. To address our questions, we build tools, in a Bayesian segmentation framework, for which we are able to provide uncertainty measures. We illustrate our models, all implemented in R packages, on an RNA-Seq dataset from a study on yeast, and show for instance that the intron boundaries are conserved across conditions while the beginning and end of transcripts are subject to differential splicing.
Document type :
Complete list of metadata

Cited literature [167 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, December 4, 2013 - 2:37:15 PM
Last modification on : Friday, August 5, 2022 - 2:38:10 PM
Long-term archiving on: : Saturday, April 8, 2017 - 3:34:48 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00913851, version 1
  • PRODINRA : 314891



Alice Cleynen. Statistical Approaches for Segmentation : Application to Genome Annotation. General Mathematics [math.GM]. Université Paris Sud - Paris XI, 2013. English. ⟨NNT : 2013PA112258⟩. ⟨tel-00913851⟩



Record views


Files downloads