Skip to Main content Skip to Navigation
Theses

Désignations nominales des événements : étude et extraction automatique dans les textes

Abstract : The aim of my PhD thesis is the study of nominal designations of events for automatic extraction. My work is part of natural language processing, or in a multidisciplinary approach that involves Linguistics and Computer Science. The aim of information extraction is to analyze natural language documents and extract information relevant to a particular application. In this general goal, many information extraction campaigns were conducted: for each event considered, the task of the campaign is to extract some information (participants, dates, numbers, etc..). From the outset these challenges relate closely to named entities (elements "significant" texts, such as names of people or places). All these information are set around the event and the work does not care about the words used to describe the event (especially when it comes to a name). The event is seen as an all-encompassing as the quantity and quality of information that compose it. Unlike work in general information retrieval, our main interest is focused only on the way are named events that occur particularly in the nominal designation used. For us, this is the event that happens that is worth talking about. The most important events are the subject of newspaper articles or appear in the history books. An event can be evoked by a verbal or nominal description. In this thesis, we reflected on the notion of event. We observed and compared the different aspects presented in the state of the art to construct a definition of the event and a typology of events generally agree that in the context of our work and designations nominal events. We also released our studies of different types of training corpus of the names of events, we show that each can be ambiguous in various ways. For these studies, the composition of an annotated corpus is an essential step, so we have the opportunity to develop an annotation guide dedicated to nominal designations events. We studied the importance and quality of existing lexicons for application in our extraction task automatically. We also focused on the context of appearance of names to determine the eventness, for this purpose, we used extraction rules. Following these studies, we extracted an eventive relative weighted lexicon (whose peculiarity is to be dedicated to the extraction of nominal events), which reflects the fact that some names are more likely than others to represent events. Used as a tip for the extraction of event names, this weight can extract names that are not present in the lexicons existing standards. Finally, using machine learning, we worked on learning contextual features based in part on the syntax to extract event names.
Document type :
Theses
Complete list of metadata

Cited literature [110 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00758062
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, November 28, 2012 - 9:07:13 AM
Last modification on : Sunday, June 26, 2022 - 11:57:30 AM
Long-term archiving on: : Saturday, December 17, 2016 - 3:33:53 PM

File

VD2_ARNULPHY_BEATRICE_02102012...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00758062, version 1

Collections

Citation

Béatrice Arnulphy. Désignations nominales des événements : étude et extraction automatique dans les textes. Autre [cs.OH]. Université Paris Sud - Paris XI, 2012. Français. ⟨NNT : 2012PA112216⟩. ⟨tel-00758062⟩

Share

Metrics

Record views

468

Files downloads

987