Skip to Main content Skip to Navigation

Network and machine learning approaches to dengue omics data

Abstract : The last 20 years have seen the emergence of powerful measurement technologies, enabling omics analysis of diverse diseases. They often provide non-invasive means to study the etiology of newly emerging complex diseases, such as the mosquito-borne infectious dengue disease. My dissertation concentrates on adapting and applying network and machine learning approaches to genomic and transcriptomic data. The first part goes beyond a previously published genome-wide analysis of 4,026 individuals by applying network analysis to find groups of interacting genes in a gene functional interaction network that, taken together, are associated to severe dengue. In this part, I first recalculated association p-values of sequences polymorphisms, then worked on mapping polymorphisms to functionally related genes, and finally explored different pathway and gene interaction databases to find groups of genes together associated to severe dengue. The second part of my dissertation unveils a theoretical approach to study a size bias of active network search algorithms. My theoretical analysis suggests that the best score of subnetworks of a given size should be size-normalized, based on the hypothesis that it is a sample of an extreme value distribution, and not a sample of the normal distribution, as usually assumed in the literature. I then suggest a theoretical solution to this bias. The third part introduces a new subnetwork search tool that I co-designed. Its underlying model and the corresponding efficient algorithm avoid size bias found in existing methods, and generates easily comprehensible results. I present an application to transcriptomic dengue data. In the fourth and last part, I describe the identification of a biomarker that detects dengue severity outcome upon arrival at the hospital using a novel machine learning approach. This approach combines two-dimensional monotonic regression with feature selection. The underlying model goes beyond the commonly used linear approaches, while allowing controlling the number of transcripts in the biomarker. The small number of transcripts along with its visual representation maximize the understanding and the interpretability of the biomarker by biomedical professionals. I present an 18-gene biomarker that allows distinguishing severe dengue patients from non-severe ones upon arrival at the hospital with a unique biomarker of high and robust predictive performance. The predictive performance of the biomarker has been confirmed on two datasets that both used different transcriptomic technologies and different blood cell subtypes.
Document type :
Complete list of metadata

Cited literature [154 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, January 2, 2020 - 1:55:35 AM
Last modification on : Saturday, July 11, 2020 - 4:46:31 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02426271, version 1



Iryna Nikolayeva. Network and machine learning approaches to dengue omics data. Bioengineering. Université Sorbonne Paris Cité, 2017. English. ⟨NNT : 2017USPCB032⟩. ⟨tel-02426271⟩



Record views


Files downloads