Skip to Main content Skip to Navigation
Theses

Développement de méthodes et d'algorithmes pour la caractérisation et l'annotation des transcriptomes avec les séquenceurs haut débit

Abstract : Since their introduction, high-throughput sequencers have revolutionized transcriptomic studies at genome scale. Indeed, they have the ability to generate millions, or even billions of short sequences, called reads. New transcriptomic approaches, such as Digital Gene Expression (DGE) and RNA-sequencing (RNA-Seq), enable the identification, quantification, and reconstitution of all transcripts of the cell, even rare ones. Among these transcripts are regulatory non-coding RNAs, alternative splice variants, which code for novel proteins, but also non colinear transcripts termed chimeras (generated by either gene fusion or trans-splicing). The characterization of these transcripts constitutes a sheer algorithmic,but also a biological challenge due to their differences in nature, their diverse implications in physiological and cellular processes, and for some their role in cancer development.In this work, we focus on algorithms and methods for the characterization and annotation of transcriptomes. First, we proposed a statistical study on DGE to assess the impact of sequence errors on the analysis. Therefrom, we developed a pipeline for the DGE annotation. Through this initial work,we demonstrated that a lot of information is shared between the reads. This property led us to design, the Gk arrays, an indexing data structure for organizing huge amounts of reads in memory and algorithms to quickly query this structure. Finally, based on the Gk arrays we have conceived, CRAC,a software specialised in the RNA-Seq processing. By integrating its own mapping process, CRAC is able to distinguish the biological phenomena from sequence errors. Moreover, it allows to identify chimeric RNAs, which may be weakly expressed in a transcriptome and are inherently complex to detect since their fragments originate from different places on the genome.
Complete list of metadatas

Cited literature [196 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00842810
Contributor : Nicolas Philippe <>
Submitted on : Tuesday, July 9, 2013 - 2:49:48 PM
Last modification on : Friday, May 15, 2020 - 12:22:03 PM
Long-term archiving on: : Thursday, October 10, 2013 - 4:10:13 AM

Identifiers

  • HAL Id : tel-00842810, version 1

Collections

Citation

Nicolas Philippe. Développement de méthodes et d'algorithmes pour la caractérisation et l'annotation des transcriptomes avec les séquenceurs haut débit. Bio-Informatique, Biologie Systémique [q-bio.QM]. Université Montpellier II - Sciences et Techniques du Languedoc, 2011. Français. ⟨tel-00842810⟩

Share

Metrics

Record views

917

Files downloads

4740