Skip to Main content Skip to Navigation
Theses

Contributions à la modélisation et la conception des systèmes de gestion de provenance à large échelle

Abstract : Provenance is a key metadata for assessing electronic documents trustworthiness. It allows to prove the quality and the reliability of its content. With the maturation of service oriented technologies and Cloud computing, more and more data is exchanged electronically and dematerialization becomes one of the key concepts to cost reduction and efficiency improvement. Although most of the applications exchanging and processing documents on the Web or in the Cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data, most of Provenance Management Systems (PMSs) are either dedicated to a specific application (workflow, database, ...) or a specific data type. Those systems were not conceived to support provenance over distributed and heterogeneous sources. This implies that end-users are faced with different provenance models and different query languages. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is considered today as a challenging task. This is also the case for designing scalable PMSs providing these features. In the fist part of our thesis, we focus on provenance modelling. We present a new provenance modelling approach based on semantic Web technologies. Our approach allows to import provenance data from heterogeneous sources, to enrich it semantically to obtain high level representation of provenance. It provides syntactic interoperability between those sources based on a minimal domain model (MDM), supports the construction of rich domain models what allows high level representations of provenance while keeping the semantic interoperability. Our modelling approch supports also semantic correlation between different provenance sources and allows the use of a high level semantic query language. In the second part of our thesis, we focus on the design, implementation and scalability issues of provenance management systems. Based on our modelling approach, we propose a centralized logical architecture for PMSs. Then, we present a mediator based architecture for PMSs aiming to preserve provenance sources distribution. Within this architecture, the mediator has a global vision on all provenance sources and possesses query processing and distribution capabilities. The validation of our modelling approach was performed in a document archival context within Novapost, a company offering SaaS services for documents archiving. Also, we propose a non-functional validation aiming to test the scalability of our architecture. This validation is based on two implementation of our PMS : he first uses an RDF triple store (Sesame) and the second a NoSQL DBMS coupled with the map-reduce parallel model (CouchDB). The tests we performed show the limits of Sesame in storing and querying large amounts of provenance data. However, the PMS based on CouchDB showed a good performance and a linear scalability
Complete list of metadatas

Cited literature [62 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00762641
Contributor : Abes Star :  Contact
Submitted on : Friday, December 7, 2012 - 3:27:13 PM
Last modification on : Wednesday, October 14, 2020 - 4:04:43 AM
Long-term archiving on: : Monday, March 11, 2013 - 11:40:26 AM

File

PhD_vf_SAKKA-Mohamed-Amin-2.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00762641, version 1

Citation

Mohamed Amin Sakka. Contributions à la modélisation et la conception des systèmes de gestion de provenance à large échelle. Architecture, aménagement de l'espace. Institut National des Télécommunications, 2012. Français. ⟨NNT : 2012TELE0023⟩. ⟨tel-00762641⟩

Share

Metrics

Record views

1220

Files downloads

2099