Skip to Main content Skip to Navigation

Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation

Abstract : This thesis deals with the characterisation of key genes in gene expression regulation, called transcription factors, in the plant Arabidopsis thaliana. Using expression data, our biological goal is to cluster transcription factors in groups of co-regulator transcription factors, and in groups of co-regulated transcription factors. To do so, we propose a two-step procedure. First, we infer the network of regulation between transcription factors. Second, we cluster transcription factors based on their connexion patterns to other transcriptions factors.From a statistical point of view, the transcription factors are the variables and the samples are the observations. The regulatory network between the transcription factors is modelled using a directed graph, where variables are nodes. The estimation of the nodes can be interpreted as a problem of variables selection. To infer the network, we perform LASSO type penalised linear regression. A preliminary approach selects a set of variable along the regularisation path using penalised likelihood criterion. However, this approach is unstable and leads to select too many variables. To overcome this difficulty, we propose to put in competition two selection procedures, designed to deal with high dimension data and mixing linear penalised regression and subsampling. Parameters estimation of the two procedures are designed to lead to select stable set of variables. Stability of results is evaluated on simulated data under a graphical model. Subsequently, we use an unsupervised clustering method on each inferred oriented graph to detect groups of co-regulators and groups of co-regulated. To evaluate the proximity between the two classifications, we have developed an index of comparaison of pairs of partitions whose relevance is tested and promoted. From a practical point of view, we propose a cascade simulation method required to respect the model complexity and inspired from parametric bootstrap, to simulate data under our model. We have validated our model by inspecting the proximity between the two classifications on simulated and real data.
Complete list of metadata

Cited literature [68 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Friday, January 19, 2018 - 4:08:07 PM
Last modification on : Wednesday, September 28, 2022 - 3:07:22 PM
Long-term archiving on: : Thursday, May 24, 2018 - 7:42:44 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01688715, version 1


Yann Vasseur. Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation. Statistiques [math.ST]. Université Paris-Saclay, 2017. Français. ⟨NNT : 2017SACLS475⟩. ⟨tel-01688715⟩



Record views


Files downloads