A Resource-Oriented Architecture for Integration and Exploitation of Linked Data

Pierre de Vettor 1, 2
2 SOC - Service Oriented Computing
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In this thesis, we focus on data integration of raw data coming from heterogeneous and multi-origin data sources on the Web. The global objective is to provide a generic and adaptive architecture able to analyze and combine this heterogeneous, informal, and sometimes meaningless data into a coherent smart data set. We define smart data as significant, semantically explicit data, ready to be used to fulfill the stakeholders' objective. This work is motivated by a live scenario from the French {\em Audience Labs} company. In this report, we propose new models and techniques to adapt the combination and integration process to the diversity of data sources. We focus on transparency and dynamicity in data source management, scalability and responsivity according to the number of data sources, adaptability to data source characteristics, and finally consistency of produced data (coherent data, without errors and duplicates). In order to address these challenges, we first propose a meta-models in order to represent the variety of data source characteristics, related to access (URI, authentication) extraction (request format), or physical characteristics (volume, latency). By relying on this coherent formalization of data sources, we define different data access strategies in order to adapt access and processing to data source capabilities. With help form these models and strategies, we propose a distributed resource oriented software architecture, where each component is freely accessible through REST via its URI. The orchestration of the different tasks of the integration process can be done in an optimized way, regarding data source and data characteristics. This data allows us to generate an adapted workflow, where tasks are prioritized amongst other in order to fasten the process, and by limiting the quantity of data transfered. In order to improve the data quality of our approach, we then focus on the data uncertainty that could appear in a Web context, and propose a model to represent uncertainty in a Web context. We introduce the concept of Web resource, based on a probabilistic model where each resource can have different possible representations, each with a probability. This approach will be the basis of a new architecture optimization allowing to take uncertainty into account during our combination process
Document type :
Theses
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01422057
Contributor : Abes Star <>
Submitted on : Friday, December 23, 2016 - 3:35:06 PM
Last modification on : Friday, May 17, 2019 - 10:28:01 AM
Long-term archiving on : Monday, March 20, 2017 - 11:08:47 PM

File

TH2016DEVETTORPIERRE.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01422057, version 1

Citation

Pierre de Vettor. A Resource-Oriented Architecture for Integration and Exploitation of Linked Data. Hardware Architecture [cs.AR]. Université de Lyon, 2016. English. ⟨NNT : 2016LYSE1176⟩. ⟨tel-01422057⟩

Share

Metrics

Record views

610

Files downloads

642