Skip to Main content Skip to Navigation
New interface

Scaling out-of-core k-nearest neighbors computation on single machines

Javier Olivares 1 
1 ASAP - As Scalable As Possible: foundations of large scale dynamic distributed systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : The K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets.
Complete list of metadata

Cited literature [129 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, February 7, 2017 - 10:17:19 AM
Last modification on : Saturday, June 25, 2022 - 7:44:04 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01421362, version 2


Javier Olivares. Scaling out-of-core k-nearest neighbors computation on single machines. Data Structures and Algorithms [cs.DS]. Université Rennes 1, 2016. English. ⟨NNT : 2016REN1S073⟩. ⟨tel-01421362v2⟩



Record views


Files downloads