Skip to Main content Skip to Navigation
Theses

Novel approaches to the clustering of large graphs

Abstract : Graphs are ubiquitous in many fields of research ranging from sociology to biology. A graph is a very simple mathematical structure that consists of a set of elements, called nodes, connected to each other by edges. It is yet able to represent complex systems such as protein-protein interaction or scientific collaborations. Graph clustering is a central problem in the analysis of graphs whose objective is to identify dense groups of nodes that are sparsely connected to the rest of the graph. These groups of nodes, called clusters, are fundamental to an in-depth understanding of graph structures. There is no universal definition of what a good cluster is, and different approaches might be best suited for different applications. Whereas most of classic methods focus on finding node partitions, i.e. on coloring graph nodes so that each node has one and only one color, more elaborate approaches are often necessary to model the complex structure of real-life graphs and to address sophisticated applications. In particular, in many cases, we must consider that a given node can belong to more than one cluster. Besides, many real-world systems exhibit multi-scale structures and one much seek for hierarchies of clusters rather than flat clusterings. Furthermore, graphs often evolve over time and are too massive to be handled in one batch so that one must be able to process stream of edges. Finally, in many applications, processing entire graphs is irrelevant or expensive, and it can be more appropriate to recover local clusters in the neighborhood of nodes of interest rather than color all graph nodes. In this work, we study alternative approaches and design novel algorithms to tackle these different problems. The novel methods that we propose to address these different problems are mostly inspired by variants of modularity, a classic measure that accesses the quality of a node partition, and by random walks, stochastic processes whose properties are closely related to the graph structure. We provide analyses that give theoretical guarantees for the different proposed techniques, and endeavour to evaluate these algorithms on real-world datasets and use cases.
Complete list of metadatas

Cited literature [157 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01987048
Contributor : Abes Star :  Contact
Submitted on : Wednesday, March 4, 2020 - 11:32:10 AM
Last modification on : Thursday, October 29, 2020 - 3:01:50 PM
Long-term archiving on: : Friday, June 5, 2020 - 2:18:21 PM

File

Hollocou-2018-These.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01987048, version 2

Collections

Citation

Alexandre Hollocou. Novel approaches to the clustering of large graphs. Social and Information Networks [cs.SI]. Université Paris sciences et lettres, 2018. English. ⟨NNT : 2018PSLEE063⟩. ⟨tel-01987048v2⟩

Share

Metrics

Record views

220

Files downloads

238