Skip to Main content Skip to Navigation
Theses

Assemblage de novo de répétitions à partir de données NGS

Abstract : The development of the next-generation sequencing methods has allowed the generation of vast amounts of data at a lower cost and time. However, the fragments obtained, called reads, have shorter lengths and higher error rates that the ones obtained with the first sequencing methods. This new type of data created new challenges in genome assembly. Even though many assembly software are published every year and algorithms are becoming more and more complex, reconstructing a whole genome de novo, in the absence of a reference genome, remains a difficult problem. One of the main causes is represented by the presence of repetitive regions in the genomes. This thesis describes algorithms designed to improve the de novo assembly of repeats. We first present our solutions focused on tandem repeats. The algorithm called DExTaR aims at extending the work done by a de novo assembly in the detection of exact tandem repeats. Based on a de Bruijn graph constructed by an assembler, our approach assembles new exact tandem repeats by analysing the parts of the graph left unresolved. The second algorithm, called MixTaR, performs only local assemblies in order to detect ex- act and approximate tandem repeats. Using the two types of reads obtained by the new sequencing methods, short and long reads, MixTaR does not require a global de novo assembly. We then propose several algorithms for simplifying the assembly problem based on a new data structure, the paired de Bruijn graph. This graph uses the paired-end information from the beginning of the assembly process as a solution to a better repeat detection and higher quality results.
Complete list of metadatas

Cited literature [211 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02521411
Contributor : Guillaume Fertin <>
Submitted on : Friday, March 27, 2020 - 1:40:13 PM
Last modification on : Thursday, April 2, 2020 - 1:49:26 AM
Long-term archiving on: : Sunday, June 28, 2020 - 2:17:17 PM

File

Thèse de Doctorat. Andreea RA...
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02521411, version 1

Collections

Citation

Andreea Radulescu. Assemblage de novo de répétitions à partir de données NGS. Informatique [cs]. Université de Nantes (UNAM), 2015. Français. ⟨tel-02521411⟩

Share

Metrics

Record views

56

Files downloads

175