Skip to Main content Skip to Navigation

Integrating heterogeneous data sources in the Web of data

Franck Michel 1, 2
2 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : To a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities.
Document type :
Complete list of metadata

Cited literature [119 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, October 19, 2017 - 3:16:07 PM
Last modification on : Sunday, May 1, 2022 - 3:14:30 AM
Long-term archiving on: : Saturday, January 20, 2018 - 1:52:02 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01508602, version 3



Franck Michel. Integrating heterogeneous data sources in the Web of data. Other [cs.OH]. Université Côte d'Azur, 2017. English. ⟨NNT : 2017AZUR4002⟩. ⟨tel-01508602v3⟩



Record views


Files downloads