Skip to Main content Skip to Navigation
Theses

FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT)

Abstract : This thesis examines the problem of indexing and searching in Distributed Hash Table (DHT). It provides a distributed system for storing document summaries based on their content. Concretely, the thesis uses Bloom filters (BF) to represent document summaries and proposes an efficient method for inserting and retrieving documents represented by BFs in an index distributed on a DHT. Content-based storage has a dual advantage. It allows to group similar documents together and to find and retrieve them more quickly at the same by using Bloom filters for keywords searches. However, processing a keyword query represented by a Bloom filter is a difficult operation and requires a mechanism to locate the Bloom filters that represent documents stored in the DHT. Thus, the thesis proposes in a second time, two Bloom filters indexes schemes distributed on DHT. The first proposed index system combines the principles of content-based indexing and inverted lists and addresses the issue of the large amount of data stored by content-based indexes. Indeed, by using Bloom filters with long length, this solution allows to store documents on a large number of servers and to index them using less space. Next, the thesis proposes a second index system that efficiently supports superset queries processing (keywords-queries) using a prefix tree. This solution exploits the distribution of the data and proposes a configurable distribution function that allow to index documents with a balanced binary tree. In this way, documents are distributed efficiently on indexing servers. In addition, the thesis proposes in the third solution, an efficient method for locating documents containing a set of keywords. Compared to solutions of the same category, the latter solution makes it possible to perform subset searches at a lower cost and can be considered as a solid foundation for supersets queries processing on over-dht index systems. Finally, the thesis proposes a prototype of a peer-to-peer system for indexing content and searching by keywords. This prototype, ready to be deployed in a real environment, is experimented with peersim that allowed to measure the theoretical performances of the algorithms developed throughout the thesis.
Complete list of metadatas

Cited literature [64 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01921587
Contributor : Abes Star :  Contact
Submitted on : Friday, January 24, 2020 - 12:05:15 PM
Last modification on : Thursday, October 22, 2020 - 11:15:07 AM
Long-term archiving on: : Saturday, April 25, 2020 - 2:29:20 PM

File

NGOM_Bassirou_these_2018.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01921587, version 2

Citation

Bassirou Ngom. FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT). Recherche d'information [cs.IR]. Sorbonne Université; Université Cheikh Anta Diop de Dakar, 2018. Français. ⟨NNT : 2018SORUS180⟩. ⟨tel-01921587v2⟩

Share

Metrics

Record views

144

Files downloads

175