Skip to Main content Skip to Navigation
Theses

Équilibrage bi-stochastique des matrices pour la détection de structures par blocs et applications

Luce Le Gorrec 1
1 IRIT-APO - Algorithmes Parallèles et Optimisation
IRIT - Institut de recherche en informatique de Toulouse
Abstract : The detection of block structures in matrices is an important challenge. First in data analysis where matrices are a key tool for data representation, as data tables or adjacency matrices. Indeed, for the first one, finding a co-clustering is equivalent to finding a row and column block structure of the matrix. For the second one, finding a structure of diagonal dominant blocks leads to a clustering of the data. Moreover, block structure detection is also usefull for the resolution of linear systems. For instance, it helps to create efficient Block Jacobi precoditioners or to find groups of rows that are strongly decorrelated in order to apply a solver such as Block Cimmino. In this dissertation, we focus our analysis on the detection of dominant diagonal block structures by symmetrically permuting the rows and columns of matrices. Lots of algorithms have been designed that aim to highlight such structures. Among them, spectral algorithms play a key role. They can be divided into two kinds. The first one consists of algorithms that first project the matrix rows onto a low-dimensional space generated by the matrix leading eigenvectors, and then apply a procedure such as a k-means on the reduced data. Their main drawbacks is that the knowledge of number of clusters to uncover is required. The second kind consists of iterative procedures that look for the k-th best partition into two subblocks of the matrix at step k. However, if the matrix structure shows more than two blocks, the best partition into two blocks may be a poor fit to the matrix groundtruth structure. Hence, we propose a spectral algorithm that deals with both issues described above. To that end, we preprocess the matrix with a doubly-stochastic scaling, which leverages the blocks. First we show the benefits of using such a scaling by using it as a preprocessing for the Louvain's algorithm, in order to uncover community structures in networks. We also investigate several global modularity measures designed for quantifying the consistency of a block structure. We generalise them to make them able to handle doubly-stochastic matrices, and thus we remark that our scaling tends to unify these measures. Then, we describe our algorithm that is based on spectral elements of the scaled matrix. Our method is built on the principle that leading singular vectors of a doubly-stochastic matrix should have a staircase pattern when their coordinates are sorted in the increasing order, under the condition that the matrix shows a hidden block structure. Tools from signal processing-that have been initially designed to detect jumps in signals-are applied to the sorted vectors in order to detect steps in these vectors, and thus to find the separations between the blocks. However, these tools are not specifically designed to this purpose. Hence procedures that we have implemented to answer the encountered issues are also described. We then propose three applications for the matrices block structure detection. First, community detection in networks, and the design of efficient Block Jacobi type preconditioners for solving linear systems. For these applications, we compare the results of our algorithm with those of algorithms that have been designed on purpose. Finally, we deal with the dialogue act detection in a discorsre, using the STAC database that consists in a chat of online players of " The Settlers of Catan ". To that end we connect classical clustering algorithms with a BiLSTM neural network taht preprocesses the dialogue unities. Finally, we conclude by giving some preliminary remarks about the extension of our method to rectangular matrices.
Complete list of metadatas

Cited literature [152 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02735291
Contributor : Abes Star :  Contact
Submitted on : Tuesday, June 2, 2020 - 3:24:09 PM
Last modification on : Tuesday, June 30, 2020 - 3:40:41 AM

File

2019TOU30136b.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02735291, version 1

Citation

Luce Le Gorrec. Équilibrage bi-stochastique des matrices pour la détection de structures par blocs et applications. Réseaux et télécommunications [cs.NI]. Université Paul Sabatier - Toulouse III, 2019. Français. ⟨NNT : 2019TOU30136⟩. ⟨tel-02735291⟩

Share

Metrics

Record views

62

Files downloads

56