Skip to Main content Skip to Navigation
Theses

Enrichissement et peuplement d’ontologie à partir de textes et de données du LOD : Application à l’annotation automatique de documents

Abstract : This thesis deals with an approach, guided by an ontology, designed to annotate documents from a corpus where each document describes an entity of the same type. In our context, all documents have to be annotated with concepts that are usually too specific to be explicitly mentioned in the texts. In addition, the annotation concepts are represented initially only by their name, without any semantic information connected to them. Finally, the characteristics of the entities described in the documents are incomplete. To accomplish this particular process of annotation of documents, we propose an approach called SAUPODOC (Semantic Annotation of Population Using Ontology and Definitions of Concepts) which combines several tasks to (1) populate and (2) enrich a domain ontology. The population step (1) adds to the ontology information from the documents in the corpus but also from the Web of Data (Linked Open Data or LOD). The LOD represents today a promising source for many applications of the Semantic Web, provided that appropriate techniques of data acquisition are developed. In the settings of SAUPODOC, the ontology population has to take into account the diversity of the data in the LOD: multiple, equivalent, multi-valued or absent properties. The correspondences to be established, between the vocabulary of the ontology to be populated and that of the LOD, are complex, thus we propose a model to facilitate their specification. Then, we show how this model is used to automatically generate SPARQL queries and facilitate the interrogation of the LOD and the population of the ontology. The latter, once populated, is then enriched (2) with the annotation concepts and definitions that are learned through examples of annotated documents. Reasoning on these definitions finally provides the desired annotations. Experiments have been conducted in two areas of application, and the results, compared with the annotations obtained with classifiers, show the interest of the approach.
Complete list of metadata

Cited literature [82 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01399253
Contributor : ABES STAR :  Contact
Submitted on : Friday, November 18, 2016 - 3:06:09 PM
Last modification on : Wednesday, November 3, 2021 - 7:35:40 AM
Long-term archiving on: : Monday, March 20, 2017 - 6:09:55 PM

File

70008_ALEC_2016_diffusion.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01399253, version 1

Citation

Céline Alec. Enrichissement et peuplement d’ontologie à partir de textes et de données du LOD : Application à l’annotation automatique de documents. Intelligence artificielle [cs.AI]. Université Paris Saclay (COmUE), 2016. Français. ⟨NNT : 2016SACLS228⟩. ⟨tel-01399253⟩

Share

Metrics

Record views

523

Files downloads

712