Skip to Main content Skip to Navigation
Theses

Linked data quality : completeness and conciseness

Abstract : The wide spread of Semantic Web technologies such as the Resource Description Framework (RDF) enables individuals to build their databases on the Web, to write vocabularies, and define rules to arrange and explain the relationships between data according to the Linked Data principles. As a consequence, a large amount of structured and interlinked data is being generated daily. A close examination of the quality of this data could be very critical, especially, if important research and professional decisions depend on it. The quality of Linked Data is an important aspect to indicate their fitness for use in applications. Several dimensions to assess the quality of Linked Data are identified such as accuracy, completeness, provenance, and conciseness. This thesis focuses on assessing completeness and enhancing conciseness of Linked Data. In particular, we first proposed a completeness calculation approach based on a generated schema. Indeed, as a reference schema is required to assess completeness, we proposed a mining-based approach to derive a suitable schema (i.e., a set of properties) from data. This approach distinguishes between essential properties and marginal ones to generate, for a given dataset, a conceptual schema that meets the user's expectations regarding data completeness constraints. We implemented a prototype called “LOD-CM” to illustrate the process of deriving a conceptual schema of a dataset based on the user's requirements. We further proposed an approach to discover equivalent predicates to improve the conciseness of Linked Data. This approach is based, in addition to a statistical analysis, on a deep semantic analysis of data and on learning algorithms. We argue that studying the meaning of predicates can help to improve the accuracy of results. Finally, a set of experiments was conducted on real-world datasets to evaluate our proposed approaches.
Complete list of metadatas

Cited literature [150 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02513652
Contributor : Abes Star :  Contact
Submitted on : Friday, March 20, 2020 - 6:57:06 PM
Last modification on : Sunday, March 22, 2020 - 1:17:19 AM

File

TheseISSA.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02513652, version 1

Collections

Citation

Subhi Issa. Linked data quality : completeness and conciseness. Artificial Intelligence [cs.AI]. Conservatoire national des arts et metiers - CNAM, 2019. English. ⟨NNT : 2019CNAM1274⟩. ⟨tel-02513652⟩

Share

Metrics

Record views

82

Files downloads

27