Skip to Main content Skip to Navigation

Filtrage sémantique et gestion distribuée de flux de données massives

Abstract : Our daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system.
Complete list of metadatas

Cited literature [191 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, May 19, 2020 - 6:51:55 AM
Last modification on : Tuesday, May 26, 2020 - 7:45:54 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02612248, version 1


Amadou Fall Dia. Filtrage sémantique et gestion distribuée de flux de données massives. Base de données [cs.DB]. Sorbonne Université, 2018. Français. ⟨NNT : 2018SORUS495⟩. ⟨tel-02612248⟩



Record views


Files downloads