Skip to Main content Skip to Navigation

Développement de nouvelles méthodes algorithmiques pour le traitement des UMI à partir des données de séquençage haut débit.

Abstract : The objectives of this thesis fall within the broad issue of processing data from next generation sequencers, and more particularly short reads from second-generation sequencers. The aspects addressed in this issue mainly focus on the development of new methodologies based on unique molecular sequences called UMI used to label the initial DNA fragments and to improve the precision of the results.First of all, in the field of transcriptomics, a new method has been developed in order to improve the results of measuring gene expression on the one hand, and to detect fusion transcripts in tumors on the other hand. This method is based on an RT-MLPA coupled to an NGS sequencer. It makes it possible to amplify the RNA fragments from a tumor sample and to obtain the sequences of the analyzed fragments. The underlying analysis aims to analyze these sequences one by one in order, first, to assign each sequence to the corresponding sample, and secondly, to find the name of the gene it expresses. For this, RT-MiS has been developed. RT-MiS is a tool that is able to perform the entire analysis starting with the extraction and correction of the UMI from the sequences until the production of the results in the form of an gene expression matrix for each sample. RT-MiSalso includes a dedicated analysis interface allowing for the tool to be launched easily by the users. This interface automates the entire analysis process as much as possible and produces the results in the form of interactive figures and graphs making biological interpretation much easier.Then, in the field of genomics, a new somatic variant detection tool was developed. The UMI-VarCal tool is a UMI-based variant caller that implements a UMI analysis to efficiently call the variants in tumor samples. The utility of using the information from the UMI is highlighted by the improved accuracy of variant detection, especially when the frequency falls below 1%. UMI-VarCal applies a Poisson test to filter out non-variant positions and then applies a UMI analysis and two complementary filters to remove false positives. UMI-VarCal has been designed in a highly optimized manner allowing it to perform its analysis while remaining more efficient than other tools in terms of variant detection and execution time.Finally, and still in the field of variant detection, a new read simulator was developed. This tool called UMI-Gen is the first read simulator capable of generating sequences with UMI. In addition, UMI-Gen is capable of inserting somatic variants (SNV) or structural variants (CNV) into the simulated files. Furthermore, by analyzing a set of normal files, it is able to estimate the background noise in these samples and reproduce it in the simulated data. These files can be used later to evaluate different variant callers, especially those implementing a UMI analysis in their algorithm.
Document type :
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Tuesday, October 12, 2021 - 4:41:12 PM
Last modification on : Tuesday, October 19, 2021 - 5:34:13 PM
Long-term archiving on: : Thursday, January 13, 2022 - 8:09:13 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03375337, version 1


Vincent Sater. Développement de nouvelles méthodes algorithmiques pour le traitement des UMI à partir des données de séquençage haut débit.. Base de données [cs.DB]. Normandie Université, 2021. Français. ⟨NNT : 2021NORMR045⟩. ⟨tel-03375337⟩



Les métriques sont temporairement indisponibles