RDF Data Interlinking : evaluation of Cross-lingual Methods

Tatiana Lesnikova 1
1 EXMO - Computer mediated exchange of structured knowledge
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : The Semantic Web extends the Web by publishing structured and interlinked data using RDF.An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly difficult when resources are described in different natural languages.This thesis investigates the effectiveness of linguistic resources for interlinking RDF data sets. For this purpose, we introduce a general framework in which each RDF resource is represented as a virtual document containing text information of neighboring nodes. The context of a resource are the labels of the neighboring nodes. Once virtual documents are created, they are projected in the same space in order to be compared. This can be achieved by using machine translation or multilingual lexical resources. Once documents are in the same space, similarity measures to find identical resources are applied. Similarity between elements of this space is taken for similarity between RDF resources.We performed evaluation of cross-lingual techniques within the proposed framework. We experimentally evaluate different methods for linking RDF data. In particular, two strategies are explored: applying machine translation or using references to multilingual resources. Overall, evaluation shows the effectiveness of cross-lingual string-based approaches for linking RDF resources expressed in different languages. The methods have been evaluated on resources in English, Chinese, French and German. The best performance (over 0.90 F-measure) was obtained by the machine translation approach. This shows that the similarity-based method can be successfully applied on RDF resources independently of their type (named entities or thesauri concepts). The best experimental results involving just a pair of languages demonstrated the usefulness of such techniques for interlinking RDF resources cross-lingually.
Document type :
Theses
Complete list of metadatas

Cited literature [137 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01366030
Contributor : Abes Star <>
Submitted on : Wednesday, September 14, 2016 - 8:07:07 AM
Last modification on : Tuesday, July 16, 2019 - 1:26:49 AM
Long-term archiving on : Thursday, December 15, 2016 - 12:24:55 PM

File

LESNIKOVA_2016_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01366030, version 1

Collections

STAR | LIG | UGA | INRIA

Citation

Tatiana Lesnikova. RDF Data Interlinking : evaluation of Cross-lingual Methods. Artificial Intelligence [cs.AI]. Université Grenoble Alpes, 2016. English. ⟨NNT : 2016GREAM011⟩. ⟨tel-01366030⟩

Share

Metrics

Record views

1000

Files downloads

671