HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

A regularized approach of instances x variables co-clustering for exploratory data analysis

Abstract : Co-clustering is a class of unsupervised data analysis techniques aiming at extracting the underlying dependency structure between the rows and columns of a data table in the form of homogeneous blocks, known as co-clusters. These techniques can be distinguished into those that aim at simultaneously clustering the instances and variables, and those that aim at clustering the values of two or more variables of a data set. Most of these techniques are limited to variables of the same type, and are hardly scalable to large data sets while providing easily interpretable clusters and co-clusters. Among the existing value based co-clustering approaches, MODL is suitable for processing large data sets with several numerical or categorical variables. In this thesis, we propose a value based approach, inspired by MODL, to perform a simultaneous clustering of the instances and variables of a data set with potentially mixed-type variables. The proposed co-clustering model provides a Maximum A Posteriori based summary of the data that can be used as it is for exploratory analysis of the data. When the summary is large, exploratory analysis tools, such as model coarsening, can be used to simplify the co-clustering which facilitates the interpretation of the results. We show that the proposed co-clustering approach can handle large data and extract easily interpretable clusters from mixed data with more than 10 millions observations. We also show the robustness of the approach, its capacity to extract inter-dependence between the variables, and its good behavior in extreme cases such as in the case of pattern-less data and in the case of perfectly correlated variables.
Complete list of metadata

Cited literature [122 references]  Display  Hide  Download

Contributor : Aichetou Bouchareb Connect in order to contact the contributor
Submitted on : Sunday, January 13, 2019 - 6:52:48 PM
Last modification on : Friday, May 6, 2022 - 4:50:07 PM
Long-term archiving on: : Sunday, April 14, 2019 - 12:58:25 PM


Files produced by the author(s)


  • HAL Id : tel-01979698, version 1



Aichetou Bouchareb. A regularized approach of instances x variables co-clustering for exploratory data analysis. Mathematics [math]. Université Paris 1 Panthéon-La Sorbonne, 2018. English. ⟨tel-01979698⟩



Record views


Files downloads