SPARQL distributed query processing over linked data

Abdoul Macina 1, 2
1 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : Driven by the Semantic Web standards, an increasing number of RDF data sources are published and connected over the Web by data providers, leading to a large distributed linked data network. However, exploiting the wealth of these data sources is very challenging for data consumers considering the data distribution, their volume growth and data sources autonomy. In the Linked Data context, federation engines allow querying these distributed data sources by relying on Distributed Query Processing (DQP) techniques. Nevertheless, a naive implementation of the DQP approach may generate a tremendous number of remote requests towards data sources and numerous intermediate results, thus leading to costly network communications. Furthermore, the distributed query semantics is often overlooked. Query expressiveness, data partitioning, and data replication are other challenges to be taken into account. To address these challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed Query Processing semantics which preserves the SPARQL language expressiveness. Afterwards, we presented several strategies for a federated query engine that transparently addresses distributed data sources, while managing data partitioning, query results completeness, data replication, and query processing performance. We implemented and evaluated our approach and optimization strategies in a federated query engine to prove their effectiveness.
Document type :
Theses
Complete list of metadatas

Cited literature [96 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02340700
Contributor : Abes Star <>
Submitted on : Thursday, October 31, 2019 - 1:02:43 AM
Last modification on : Friday, November 1, 2019 - 1:04:11 AM

File

2018AZUR4230.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02340700, version 1

Collections

Citation

Abdoul Macina. SPARQL distributed query processing over linked data. Other [cs.OH]. Université Côte d'Azur, 2018. English. ⟨NNT : 2018AZUR4230⟩. ⟨tel-02340700⟩

Share

Metrics

Record views

105

Files downloads

19