Integrating Heterogeneous Data Sources in the Web of Data

Franck Michel 1, 2, 3
3 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : To a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities.
Complete list of metadatas

Cited literature [118 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01508602
Contributor : Franck Michel <>
Submitted on : Friday, November 10, 2017 - 9:46:50 PM
Last modification on : Monday, November 5, 2018 - 3:52:10 PM
Long-term archiving on : Sunday, February 11, 2018 - 3:54:36 PM

File

PhD Manuscript.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License

Identifiers

  • HAL Id : tel-01508602, version 2

Citation

Franck Michel. Integrating Heterogeneous Data Sources in the Web of Data. Databases [cs.DB]. Université Côte d'Azur, 2017. English. ⟨tel-01508602v2⟩

Share

Metrics

Record views

118

Files downloads

1036