Skip to Main content Skip to Navigation
Theses

Novel components at the periphery of long read genome assembly tools

Pierre Marijon 1, 2, 3 
1 BONSAI - Bioinformatics and Sequence Analysis
Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189, CNRS - Centre National de la Recherche Scientifique
Abstract : The sequencing of genetic information provides better understanding for a large number of biological phenomena: e.g. genetic diseases, speciation events, fundamental mechanisms of cell function. Sequencing techniques have considerably evolved since the Sanger method (1977). Nowadays third-generation sequencing technologies greatly reduce the costs of sequencing complete genomes. They produce longer reads (sequence fragments), but require the design of specific assembly tools that take into account the high error rates in the produced fragments. The study of methods used by third-generation read assembly pipelines has revealed that improvements in assembly were possible without modifying assembly tools themselves. Some improvements are thus proposed in this thesis work, and were implemented through publicly available tools. yacrd and fpa pre-process the set of reads prior to assembly, in order to improve efficiency and quality of the assembly process. knot combines information from both the input reads and an assembly, in order to provide insights on how to improve the contiguity of an assembly.
Complete list of metadata

Cited literature [135 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02441360
Contributor : Pierre Marijon Connect in order to contact the contributor
Submitted on : Wednesday, January 15, 2020 - 5:34:53 PM
Last modification on : Thursday, March 24, 2022 - 3:42:53 AM
Long-term archiving on: : Thursday, April 16, 2020 - 4:42:24 PM

File

Th_se.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02441360, version 1

Citation

Pierre Marijon. Novel components at the periphery of long read genome assembly tools. Computer Science [cs]. University of Lille, 2019. English. ⟨tel-02441360⟩

Share

Metrics

Record views

184

Files downloads

362