Skip to Main content Skip to Navigation
New interface

Multimodal exploration of human genome sequencing to solve the unsolved rare diseases

Abstract : Rare diseases are individually rare but collectively frequent, with more than 7% of living adults affected by one of the 6000 currently described diseases. An estimated 72% of rare diseases are genetic in origin. Since the next generation sequencing (NGS) technology revolution, the rare diseases diagnosis bottleneck is no longer the sequencing but the analysis of the massive amount of data produced. Despite genome sequencing accessibility in clinical routine, the majority of patients suffering from rare diseases are still undiagnosed. Using bioinformatics and data science, my thesis project aimed to manage current bottlenecks of genomic medicine to improve rare disease diagnoses. This manuscript is focused on two main projects I led during this Ph.D. with SeqOne Genomics and CHU Grenoble Alpes.First, I tackled the reinterpretation challenge of previous sequencing analysis that remained unsolved. This reinterpretation was reported manually, and the lack of human resources and automated methods made it difficult to apply in routine diagnosis. Taking advantage of the collaborative and dynamic database ClinVar of shared variant interpretation, we developed Genome Alert!, an open-source automated method that monitors ClinVar and monthly reassesses variant pathogenicity and symptom-gene associations. The re-interpretation of 4,929 analyses revealed 45 changes with potential clinical impact, leading to four additional diagnoses. This work represents a first large validation study of an automated sequencing data re-interpretation system that could become a standard in genomic medicine.Lastly, I explored the clinical data computation challenge, aiming to improve the medical coding or physician’s phenotyping use in genomic analysis. We report the first study focusing on phenotyping practices in clinical sequencing analysis, analyzing the records of 1,686 patients from four international groups. Despite the adoption of a common standard called Human Phenotype Ontology, we found a highly heterogeneous approach to phenotyping as regards the number and choice of symptoms, even for the same patients. This fluctuating description is a major challenge that has to be overcome to enable us to exploit the clinical data in medical records. As an illustration, less than half (43%) of declared symptom-gene associations in the cohort were covered in public databases.Aiming to model the medical inductive reasoning that could explain the heterogeneity of phenotyping across clinical observations, we developed methods based on the association of symptoms with the same genetic disorder. Using graph algorithms and collaborative filtering, we trained a symptom interaction model that projects clinical descriptions in HPO format including 16,600 symptoms into the dimension of interacting symptoms containing 390 groups and 1,131,886 pairs of associated symptoms in diseases. This model uncovered the missing pieces of the incomplete clinical descriptions puzzle, achieving 99.8% coverage of the medical observations with knowledge in the medical literature. To evaluate its clinical relevance, we applied this symptom interaction model to phenotype-driven gene prioritization in the cohort and improved the diagnostic performance by 42 % compared to the best current competitor. This method should enable discoveries in precision medicine by standardizing clinical descriptions.With the work described in this manuscript, I hope I succeeded in making my contribution to spreading genomic medicine awareness in the community and providing technical solutions to improve rare diseases’ patient care.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Thursday, November 24, 2022 - 3:59:08 PM
Last modification on : Saturday, November 26, 2022 - 3:42:20 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03870153, version 1




Kévin Yauy. Multimodal exploration of human genome sequencing to solve the unsolved rare diseases. Development Biology. Université Grenoble Alpes [2020-..], 2022. English. ⟨NNT : 2022GRALV058⟩. ⟨tel-03870153⟩



Record views


Files downloads