Skip to Main content Skip to Navigation

Apprentissage de co-similarités pour la classification automatique de données monovues et multivues

Abstract : Machine learning consists in conceiving computer programs capable of learning from their environment, or from data. Different kind of learning exist, depending on what the program is learning, or in which context it learns, which naturally forms different tasks. Similarity measures play a predominant role in most of these tasks, which is the reason why this thesis focus on their study. More specifically, we are focusing on data clustering, a so called non supervised learning task, in which the goal of the program is to organize a set of objects into several clusters, in such a way that similar objects are grouped together. In many applications, these objects (documents for instance) are described by their links to other types of objects (words for instance), that can be clustered as well. This case is referred to as co-clustering, and in this thesis we study and improve the co-similarity algorithm XSim. We demonstrate that these improvements enable the algorithm to outperform the state of the art methods. Additionally, it is frequent that these objects are linked to more than one other type of objects, the data that describe these multiple relations between these various types of objects are called multiview. Classical methods are generally not able to consider and use all the information contained in these data. For this reason, we present in this thesis a new multiview similarity algorithm called MVSim, that can be considered as a multiview extension of the XSim algorithm. We demonstrate that this method outperforms state of the art multiview methods, as well as classical approaches, thus validating the interest of the multiview aspect. Finally, we also describe how to use the MVSim algorithm to cluster large-scale single-view data, by first splitting it in multiple subsets. We demonstrate that this approach allows to significantly reduce the running time and the memory footprint of the method, while slightly lowering the quality of the obtained clustering compared to a straightforward approach with no splitting.
Document type :
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Thursday, May 2, 2013 - 3:04:12 PM
Last modification on : Wednesday, July 6, 2022 - 4:22:07 AM
Long-term archiving on: : Monday, August 19, 2013 - 1:45:10 PM


Version validated by the jury (STAR)


  • HAL Id : tel-00819840, version 1



Clément Grimal. Apprentissage de co-similarités pour la classification automatique de données monovues et multivues. Autre [cs.OH]. Université de Grenoble, 2012. Français. ⟨NNT : 2012GRENM092⟩. ⟨tel-00819840⟩



Record views


Files downloads