Skip to Main content Skip to Navigation
Theses

Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine

Abstract : Today, a huge amount of data is made available to the research community through several web-based libraries. Enhancing data collected from scientific documents is a major challenge in order to analyze and reuse efficiently domain knowledge. To be enhanced, data need to be extracted from documents and structured in a common representation using a controlled vocabulary as in ontologies. Our research deals with knowledge engineering issues of experimental data, extracted from scientific articles, in order to reuse them in decision support systems. Experimental data can be represented by n-ary relations which link a studied object (e.g. food packaging, transformation process) with its features (e.g. oxygen permeability in packaging, biomass grinding) and capitalized in an Ontological and Terminological Ressource (OTR). An OTR associates an ontology with a terminological and/or a linguistic part in order to establish a clear distinction between the term and the notion it denotes (the concept). Our work focuses on n-ary relation extraction from scientific documents in order to populate a domain OTR with new instances. Our contributions are based on Natural Language Processing (NLP) together with data mining approaches guided by the domain OTR. More precisely, firstly, we propose to focus on unit of measure extraction which are known to be difficult to identify because of their typographic variations. We propose to rely on automatic classification of texts, using supervised learning methods, to reduce the search space of variants of units, and then, we propose a new similarity measure that identifies them, taking into account their syntactic properties. Secondly, we propose to adapt and combine data mining methods (sequential patterns and rules mining) and syntactic analysis in order to overcome the challenging process of identifying and extracting n-ary relation instances drowned in unstructured texts.
Document type :
Theses
Complete list of metadatas

Cited literature [72 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01333058
Contributor : Abes Star :  Contact
Submitted on : Thursday, June 16, 2016 - 6:02:08 PM
Last modification on : Friday, June 14, 2019 - 1:58:14 AM

File

These_Berrahou_2015.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01333058, version 1

Collections

Citation

Soumia Lilia Berrahou. Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine. Traitement du texte et du document. Université Montpellier, 2015. Français. ⟨NNT : 2015MONTS019⟩. ⟨tel-01333058⟩

Share

Metrics

Record views

449

Files downloads

422