Gestion et exploitation de larges bases de connaissances en présence de données incomplètes et incertaines

Abstract : In the era of digitilization, and with the emergence of several semantic Web applications, many new knowledge bases (KBs) are available on the Web. These KBs contain (named) entities and facts about these entities. They also contain the semantic classes of these entities and their mutual links. In addition, multiple KBs could be interconnected by their entities, forming the core of the linked data web. A distinctive feature of these KBs is that they contain millions to trillions of unreliable RDF triples. This uncertainty has multiple causes. It can result from the integration of data sources with various levels of intrinsic reliability or it can be caused by some considerations to preserve confidentiality. Furthermore, it may be due to factors related to the lack of information, the limits of measuring equipment or the evolution of information. The goal of this thesis is to improve the usability of modern systems aiming at exploiting uncertain KBs. In particular, this work proposes cooperative and intelligent techniques that could help the user in his decision-making when his query returns unsatisfactory results in terms of quantity or reliability. First, we address the problem of failing RDF queries (i.e., queries that result in an empty set of responses).This type of response is frustrating and does not meet the user’s expectations. The approach proposed to handle this problem is query-driven and offers a two fold advantage: (i) it provides the user with a rich explanation of the failure of his query by identifying the MFS (Minimal Failing Sub-queries) and (ii) it allows the computation of alternative queries called XSS (maXimal Succeeding Sub-queries), semantically close to the initial query, with non-empty answers. Moreover, from a user’s point of view, this solution offers a high level of flexibility given that several degrees of uncertainty can be simultaneously considered.In the second contribution, we study the dual problem to the above problem (i.e., queries whose execution results in a very large set of responses). Our solution aims at reducing this set of responses to enable their analysis by the user. Counterparts of MFS and XSS have been defined. They allow the identification, on the one hand, of the causes of the problem and, on the other hand, of alternative queries whose results are of reasonable size and therefore can be directly and easily used in the decision making process.All our propositions have been validated with a set of experiments on different uncertain and large-scale knowledge bases (WatDiv and LUBM). We have also used several Triplestores to conduct our tests.
Document type :
Theses
Complete list of metadatas

Cited literature [166 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02452333
Contributor : Abes Star <>
Submitted on : Thursday, January 23, 2020 - 2:15:07 PM
Last modification on : Friday, January 24, 2020 - 1:02:17 AM

File

2019ESMA0016_dellal.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02452333, version 1

Collections

Citation

Ibrahim Dellal. Gestion et exploitation de larges bases de connaissances en présence de données incomplètes et incertaines. Autre [cs.OH]. ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d'Aérotechique - Poitiers, 2019. Français. ⟨NNT : 2019ESMA0016⟩. ⟨tel-02452333⟩

Share

Metrics

Record views

66

Files downloads

71