Identity Management in Knowledge Graphs

Abstract : In the absence of a central naming authority on the Web of data, it is common for different knowledge graphs to refer to the same thing by different names (IRIs). Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Such identity statements have strict logical semantics, indicating that every property asserted to one name, will also be inferred to the other, and vice versa. While such inferences can be extremely useful in enabling and enhancing knowledge-based systems such as search engines and recommendation systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of data. With several studies showing that owl:sameAs is indeed misused for different reasons, a proper approach towards the handling of identity links is required in order to make the Web of data succeed as an integrated knowledge space. This thesis investigates the identity problem at hand, and provides different, yet complementary solutions. Firstly, it presents the largest dataset of identity statements that has been gathered from the LOD Cloud to date, and a web service from which the data and its equivalence closure can be queried. Such resource has both practical impacts (it helps data users and providers to find different names for the same entity), as well as analytical value (it reveals important aspects of the connectivity of the LOD Cloud). In addition, by relying on this collection of 558 million identity statements, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect possibly erroneous identity assertions. For this, we assign an error degree for each owl:sameAs based on the density of the community(ies) in which they occur, and their symmetrical characteristics. One benefit of this approach is that it does not rely on any additional knowledge. Finally, as a way to limit the excessive and incorrect use of owl:sameAs, we define a new relation for asserting the identity of two ontology instances in a specific context (a sub-ontology). This identity relation is accompanied with an approach for automatically detecting these links, with the ability of using certain expert constraints for filtering irrelevant contexts. As a first experiment, the detection and exploitation of the detected contextual identity links are conducted on two knowledge graphs for life sciences, constructed in a mutual effort with domain experts from the French National Institute of Agricultural Research (INRA).
Document type :
Theses
Complete list of metadatas

https://hal.archives-ouvertes.fr/tel-02073961
Contributor : Abes Star <>
Submitted on : Monday, May 13, 2019 - 5:38:07 PM
Last modification on : Wednesday, May 15, 2019 - 4:52:37 AM

File

69941_RAAD_2018_archivage.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02073961, version 2

Citation

Joe Raad. Identity Management in Knowledge Graphs. Computation and Language [cs.CL]. Université Paris-Saclay, 2018. English. ⟨NNT : 2018SACLA028⟩. ⟨tel-02073961v2⟩

Share

Metrics

Record views

67

Files downloads

20