Skip to Main content Skip to Navigation
Theses

Modélisation pangénomique du déséquilibre de liaison à l'aide de réseaux bayésiens hiérarchiques latents et applications

Abstract : Recent high-throughput genomic technologies opened the way for association studies aiming at the genome-wide characterization of genetic factors involved in complex genetic diseases, such as asthma and diabetes. In these studies, linkage disequilibrium (LD) reflects the existence of complex dependences in genetic data and plays a central role, since it ensures a precise localization of genetic factors. Nevertheless, the high complexity of LD, as well as the large dimension of genetic data, represents strong difficulties to consider. Research works of this PhD were carried out in this context. The contribution of research works presented here is twofold, since it is both theoretical and applied. On the theoretical side, we proposed a new approach of LD modeling. It is based on the development of a model coming from artificial intelligence and machine learning, the forest of hierarchical latent class models (FHLCM). The most significant contributions introduced are the ability of taking into account the fuzzy nature of LD and organizing into a hierarchy the multiple LD degrees. A novel scalable learning algorithm, named CFHLC, was developed in two versions: the first requires to split genome into contiguous windows to resolve the scalability issue, and the second (CFHLC+), more recent and advanced, implements a sliding window on chromosome. Using a real dataset, the comparison of the CFHLC method with others revealed that the former offers a more accurate modeling of LD. Besides, learning on data showing varying LD patterns showed the ability of FHLCM to faithfully reproduce the LD structure. Finally, the empirical analysis of learning complexity showed linearity in time when the number of variables to process increases. On the applied side, we explored two research avenues: causal discovery and global and intuitive visualization of LD. On the one hand, a systematic study of the ability of FHLCM for causal discovery is illustrated in the context of genetic association. This work established the basis of the development of novel methods for causal genetic factor identification in genome-wide association studies. On the other hand, a method was developed for the global and intuitive visualization of LD into three main contexts that geneticist can meet: visualization of short-range, long-range and genome-wide LD. This new method brings several assets as follows: (i) both pairwise LD (two variables) and multilocus LD (more than two variables) are simultaneously displayed, (ii) short-range and long-range LD are easily distinguished, and (iii) information is summarized in a hierarchical manner.
Complete list of metadatas

Cited literature [115 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00628759
Contributor : Raphaël Mourad <>
Submitted on : Tuesday, October 4, 2011 - 10:43:11 AM
Last modification on : Friday, October 23, 2020 - 4:45:22 PM
Long-term archiving on: : Thursday, January 5, 2012 - 2:22:28 AM

Identifiers

  • HAL Id : tel-00628759, version 1

Citation

Raphaël Mourad. Modélisation pangénomique du déséquilibre de liaison à l'aide de réseaux bayésiens hiérarchiques latents et applications. Sciences du Vivant [q-bio]. Université de Nantes, 2011. Français. ⟨tel-00628759⟩

Share

Metrics

Record views

526

Files downloads

3350