Skip to Main content Skip to Navigation

Développement de méthodes et d'algorithmes pour la caractérisation et l'annotation des transcriptomes avec les séquenceurs haut débit

Abstract : Since their introduction, high-throughput sequencers have revolutionized transcriptomic studies at genome scale. Indeed, they have the ability to generate millions, or even billions of short sequences, called reads. New transcriptomic approaches, such as Digital Gene Expression (DGE) and RNA-sequencing (RNA-Seq), enable the identification, quantification, and reconstitution of all transcripts of the cell, even rare ones. Among these transcripts are regulatory non-coding RNAs, alternative splice variants, which code for novel proteins, but also non colinear transcripts termed chimeras (generated by either gene fusion or trans-splicing). The characterization of these transcripts constitutes a sheer algorithmic,but also a biological challenge due to their differences in nature, their diverse implications in physiological and cellular processes, and for some their role in cancer development.In this work, we focus on algorithms and methods for the characterization and annotation of transcriptomes. First, we proposed a statistical study on DGE to assess the impact of sequence errors on the analysis. Therefrom, we developed a pipeline for the DGE annotation. Through this initial work,we demonstrated that a lot of information is shared between the reads. This property led us to design, the Gk arrays, an indexing data structure for organizing huge amounts of reads in memory and algorithms to quickly query this structure. Finally, based on the Gk arrays we have conceived, CRAC,a software specialised in the RNA-Seq processing. By integrating its own mapping process, CRAC is able to distinguish the biological phenomena from sequence errors. Moreover, it allows to identify chimeric RNAs, which may be weakly expressed in a transcriptome and are inherently complex to detect since their fragments originate from different places on the genome.
Complete list of metadatas

Cited literature [196 references]  Display  Hide  Download
Contributor : Nicolas Philippe <>
Submitted on : Tuesday, July 9, 2013 - 2:49:48 PM
Last modification on : Friday, May 15, 2020 - 12:22:03 PM
Long-term archiving on: : Thursday, October 10, 2013 - 4:10:13 AM


  • HAL Id : tel-00842810, version 1



Nicolas Philippe. Développement de méthodes et d'algorithmes pour la caractérisation et l'annotation des transcriptomes avec les séquenceurs haut débit. Bio-Informatique, Biologie Systémique [q-bio.QM]. Université Montpellier II - Sciences et Techniques du Languedoc, 2011. Français. ⟨tel-00842810⟩



Record views


Files downloads