Skip to Main content Skip to Navigation
Theses

Caractérisation des erreurs de séquençage non aléatoires : application aux mosaïques et tumeurs hétérogènes

Abstract : The advent of Next Generation DNA Sequencing technologies has revolutionized the field of personalized genomics through their resolution and low cost. However, these new technologies are associated with a relatively high error rate, which varies between 0.1% and 1% for second-generation sequencers. This value is problematic when searching for low allelic ratio variants, as observed in the case of heterogeneous tumors. Indeed, such error rate can lead to thousands of false positives. Each region of the studied DNA must therefore be sequenced several times, and the variants are then filtered according to criteria based on their depth. Despite these filters, the number of errors remains significant, showing the limit of conventional approaches and indicating that some sequencing errors are not random.In the context of this thesis, we have developed an exact algorithm for over-represented degenerate DNA motifs discovery on the upstream of non-random sequencing errors and thus potentially linked to their appearance. This algorithm was implemented in a software called DiNAMO, which was tested on sequencing data from IonTorrent and Illumina technologies.The experimental results revealed several motifs, specific to each of these two technologies. We then showed that taking these motifs into account in the analysis reduced significantly the false-positive rate. DiNAMO can therefore be used downstream of each analysis, as an additional filter to improve the identification of variants, especially, variants with low allelic ratio.
Document type :
Theses
Complete list of metadata

Cited literature [199 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02012610
Contributor : ABES STAR :  Contact
Submitted on : Friday, February 8, 2019 - 6:43:05 PM
Last modification on : Saturday, June 4, 2022 - 3:27:06 AM
Long-term archiving on: : Thursday, May 9, 2019 - 3:21:09 PM

File

2018LILUS014.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02012610, version 1

Collections

Citation

Chadi Saad. Caractérisation des erreurs de séquençage non aléatoires : application aux mosaïques et tumeurs hétérogènes. Médecine humaine et pathologie. Université de Lille, 2018. Français. ⟨NNT : 2018LILUS014⟩. ⟨tel-02012610⟩

Share

Metrics

Record views

117

Files downloads

1590