Matrix factorization framework for simultaneous data (co-)clustering and embedding

Abstract : Advances in computer technology and recent advances in sensing and storage technology have created many high-volume, high-dimensional data sets. This increase in both the volume and the variety of data calls for advances in methodology to understand, process, summarize and extract information from such kind of data. From a more technical point of view, understanding the structure of large data sets arising from the data explosion is of fundamental importance in data mining and machine learning. Unlike supervised learning, unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no welldefined notion of classes. In this thesis, we focus on three important techniques of unsupervised learning for data analysis, namely data dimensionality reduction, data clustering and data co-clustering. Our major contribution proposes a novel way to consider the clustering (resp. coclustering) and the reduction of the dimension simultaneously. The main idea presented is to consider an objective function that can be decomposed into two terms where one of them performs the dimensionality reduction while the other one returns the clustering (resp. co-clustering) of data in the projected space simultaneously. We have further introduced the regularized versions of our approaches with graph Laplacian embedding in order to better preserve the local geometry of the data. Experimental results on synthetic data as well as real data demonstrate that the proposed algorithms can provide good low-dimensional representations of the data while improving the clustering (resp. co-clustering) results. Motivated by the good results obtained by graph-regularized-based clustering (resp. co-clustering) methods, we developed a new algorithm based on the multi-manifold learning. We approximate the intrinsic manifold using a subset of candidate manifolds that can better reflect the local geometrical structure by making use of the graph Laplacian matrices. Finally, we have investigated the integration of some selected instance-level constraints in the graph Laplacians of both data samples and data features. By doing that, we show how the addition of priory knowledge can assist in data co-clustering and improves the quality of the obtained co-clusters.
Keywords : Data science
Complete list of metadatas

Cited literature [201 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02179223
Contributor : Abes Star <>
Submitted on : Wednesday, July 10, 2019 - 3:30:37 PM
Last modification on : Friday, July 12, 2019 - 1:16:36 AM

File

va_Allab_Kais.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02179223, version 1

Collections

Citation

Kais Allab. Matrix factorization framework for simultaneous data (co-)clustering and embedding. Data Structures and Algorithms [cs.DS]. Université Sorbonne Paris Cité, 2016. English. ⟨NNT : 2016USPCB083⟩. ⟨tel-02179223⟩

Share

Metrics

Record views

41

Files downloads

20