Sur deux problèmes d’apprentissage automatique : la détection de communautés et l’appariement adaptatif

Lennart Gulikers 1, 2, 3
2 DYOGENE - Dynamics of Geometric Networks
Inria de Paris, CNRS - Centre National de la Recherche Scientifique : UMR 8548, DI-ENS - Département d'informatique de l'École normale supérieure
3 INFINE - INFormation NEtworks
Inria Saclay - Ile de France
Abstract : In this thesis, we study two problems of machine learning: (I) community detection and (II) adaptive matching. I) It is well-known that many networks exhibit a community structure. Finding those communities helps us understand and exploit general networks. In this thesis we focus on community detection using so-called spectral methods based on the eigenvectors of carefully chosen matrices. We analyse their performance on artificially generated benchmark graphs. Instead of the classical Stochastic Block Model (which does not allow for much degree-heterogeneity), we consider a Degree-Corrected Stochastic Block Model (DC-SBM) with weighted vertices, that is able to generate a wide class of degree sequences. We consider this model in both a dense and sparse regime. In the dense regime, we show that an algorithm based on a suitably normalized adjacency matrix correctly classifies all but a vanishing fraction of the nodes. In the sparse regime, we show that the availability of only a small amount of information entails the existence of an information-theoretic threshold below which no algorithm performs better than random guess. On the positive side, we show that an algorithm based on the non-backtracking matrix works all the way down to the detectability threshold in the sparse regime, showing the robustness of the algorithm. This follows after a precise characterization of the non-backtracking spectrum of sparse DC-SBM's. We further perform tests on well-known real networks. II) Online two-sided matching markets such as Q&A forums and online labour platforms critically rely on the ability to propose adequate matches based on imperfect knowledge of the two parties to be matched. We develop a model of a task / server matching system for (efficient) platform operation in the presence of such uncertainty. For this model, we give a necessary and sufficient condition for an incoming stream of tasks to be manageable by the system. We further identify a so-called back-pressure policy under which the throughput that the system can handle is optimized. We show that this policy achieves strictly larger throughput than a natural greedy policy. Finally, we validate our model and confirm our theoretical findings with experiments based on user-contributed content on an online platform.
Liste complète des métadonnées

Cited literature [40 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01834967
Contributor : Abes Star <>
Submitted on : Wednesday, July 11, 2018 - 10:00:13 AM
Last modification on : Tuesday, February 5, 2019 - 2:38:01 PM
Document(s) archivé(s) le : Friday, October 12, 2018 - 2:15:03 PM

File

Gulikers-2017-These.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01834967, version 1

Citation

Lennart Gulikers. Sur deux problèmes d’apprentissage automatique : la détection de communautés et l’appariement adaptatif. Machine Learning [stat.ML]. PSL Research University, 2017. Français. ⟨NNT : 2017PSLEE062⟩. ⟨tel-01834967⟩

Share

Metrics

Record views

276

Files downloads

76