Représentations vectorielles et apprentissage automatique pour l’alignement d’entités textuelles et de concepts d’ontologie : application à la biologie

Abstract : The impressive increase in the quantity of textual data makes it difficult today to analyze them without the assistance of tools. However, a text written in natural language is unstructured data, i.e. it cannot be interpreted by a specialized computer program, without which the information in the texts remains largely under-exploited. Among the tools for automatic extraction of information from text, we are interested in automatic text interpretation methods for the entity normalization task that consists in automatically matching text entitiy mentions to concepts in a reference terminology. To accomplish this task, we propose a new approach by aligning two types of vector representations of entities that capture part of their meanings: word embeddings for text mentions and concept embeddings for concepts, designed specifically for this work. The alignment between the two is done through supervised learning. The developed methods have been evaluated on a reference dataset from the biological domain and they now represent the state of the art for this dataset. These methods are integrated into a natural language processing software suite and the codes are freely shared.
Complete list of metadatas

Cited literature [206 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02166253
Contributor : Abes Star <>
Submitted on : Wednesday, June 26, 2019 - 4:18:09 PM
Last modification on : Thursday, July 4, 2019 - 6:37:48 AM

File

75823_FERRE_2019_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02166253, version 1

Citation

Arnaud Ferré. Représentations vectorielles et apprentissage automatique pour l’alignement d’entités textuelles et de concepts d’ontologie : application à la biologie. Intelligence artificielle [cs.AI]. Université Paris-Saclay, 2019. Français. ⟨NNT : 2019SACLS117⟩. ⟨tel-02166253⟩

Share

Metrics

Record views

308

Files downloads

157