Skip to Main content Skip to Navigation
Theses

Two statistical methods for graph model selection : Distance to the microcanonical ensemble and prequential inference on edge sequences

Louis Duvivier 1, 2 
Abstract : For the last 25 years, complex networks have been an active field of research. The size of the networks studied imposes to model their structure in order to make it understandable. Many models have been put forward to do so, whether based on nodes’ degree, on a partition of the node into blocks, on an embeddeding in a latent space, etc. The estimation of those models’ parameters has shown how important it is to follow a rigourous statistical approach in order to avoid both overfitting and underfitting. In this thesis, we build on this previous work to develop two methodologies aiming at evaluating the relevance of a model with respect to a given graph. First, we study the geometric structure of the microcanonical ensemble and highlight the existence of a characteristic radius for various models. This allows us to suggest a statistical testing methodology inspired from the principles of the p-value to test a model. Then, to tackle the issue of the lack of observations compared with the number of parameters of the model, we focused on edge models. We suggest a methodology to evaluate a model, based on the minimum description length. Its main benefits are its rigorous statistical grounds, which allow for an interpretation of the results, and a common formulation for models of different nature (SBM and configuration model, for example) which allows to compare their performance on a graph. All along our work, we remained careful to follow a formal approach which allowed us to proove various properties about the methodologies described, particularly about the convergence of the estimators.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03670853
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, May 17, 2022 - 6:10:13 PM
Last modification on : Wednesday, May 18, 2022 - 3:40:15 AM

File

these.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03670853, version 1

Citation

Louis Duvivier. Two statistical methods for graph model selection : Distance to the microcanonical ensemble and prequential inference on edge sequences. Networking and Internet Architecture [cs.NI]. Université de Lyon, 2021. English. ⟨NNT : 2021LYSEI093⟩. ⟨tel-03670853⟩

Share

Metrics

Record views

58

Files downloads

1