Skip to Main content Skip to Navigation
Theses

Extraction d'Information et modélisation de connaissances à partir de Notes de Communication Orale

Abstract : In spite of the rise of Information Extraction and the development of many applications in the last twenty years, this task encounters problems when it is carried out on atypical texts such as oral communication notes.
Oral communication notes are texts which are the result of an oral communication (meeting, talk, etc.) and they aim to synthesize the informative contents of the communication. These constraints of drafting (speed and limited amount of writing) lead to linguistic characteristics which the traditional methods of Natural Language Processing and Information Extraction are badly adapted to. Although they are rich in information, they are not exploited by systems which extract information from texts. In this thesis, we propose an extraction method adapted to oral communication notes. This method, called MEGET, is based on an ontology which depends on the information to be extracted (“extraction ontology”). This ontology is obtained by the unification of an “ontology of needs”, which describe the information to be found, with an “ontology of terms” which conceptualize the terms of the corpus which are related to the required information. The ontology of terms is elaborated from terminology extracted from texts and enriched by terms found in specialized documents. The extraction ontology is formalized by a set of rules which are provided as a knowledge base for the extraction system SYGET. This system (1) carries out a labelling of each instance of every element of the extraction ontology and (2) extracts the information. This approach is validated in several corpora.
Document type :
Theses
Complete list of metadatas

Cited literature [130 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00109400
Contributor : Even Fabrice <>
Submitted on : Tuesday, October 24, 2006 - 2:22:27 PM
Last modification on : Monday, October 19, 2020 - 11:12:45 AM
Long-term archiving on: : Friday, November 25, 2016 - 2:02:16 PM

Identifiers

  • HAL Id : tel-00109400, version 1

Collections

Citation

Fabrice Even. Extraction d'Information et modélisation de connaissances à partir de Notes de Communication Orale. Autre [cs.OH]. Université de Nantes, 2005. Français. ⟨tel-00109400⟩

Share

Metrics

Record views

596

Files downloads

4273