Skip to Main content Skip to Navigation
Theses

Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering

Abstract : This thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed.
Document type :
Theses
Complete list of metadatas

Cited literature [199 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01982476
Contributor : Abes Star :  Contact
Submitted on : Tuesday, January 15, 2019 - 4:40:12 PM
Last modification on : Friday, November 6, 2020 - 3:43:07 AM

File

TheseFinale-MORVAN.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01982476, version 1

Citation

Anne Morvan. Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering. Machine Learning [cs.LG]. Université Paris sciences et lettres, 2018. English. ⟨NNT : 2018PSLED033⟩. ⟨tel-01982476⟩

Share

Metrics

Record views

782

Files downloads

521