Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale

Damien Nouvel 1
1 BDTLN - Bases de données et traitement des langues naturelles
LIFAT - Laboratoire d'Informatique Fondamentale et Appliquée de Tours
Abstract : Those latest decades, the development of information and communication technologies has substantially modified the way we access knowledge. Facing the volume and the diversity of data streams, working out robust and efficient technologies to retrieve information becomes a necessity. In this context, Named Entities (persons, locations, organizations, numerical expressions, brands, functions, etc.) may be required in order to categorize, index or, more generaly, manipulate contents. Our work focuses on their recognition and annotation inside radio and TV broadcasts transcripts, in the context of Ester2 and Etape evaluation campaigns. In the first part, we introduce our problematic, the automatic recognition of named entities. We describe the commonly conducted analysis to process natural language, question the linguistic properties of named entities (related notions, typologies, evaluation and annotation) and describe state-of-the-art approaches. From their linguistic nature and by interpreting annotation as a local structuring, we propose an instruction-driven approach, based on annotation markers (tags), which originality consists in considering those elements in isolation. In the second part, we present the formalism used to explore data and introduce our formal framework. Sentences are represented as sequences of enriched items (morpho-syntax, lexicon) that preserve ambiguity. We also propose an alternative representation by segments that allows to limit combinatorial search. Patterns correlated to annotation markers are extracted as annotation rules. Those may be used by models so as to actually annotate texts. The last part presents the experimental framework, the implemented system (mXS) and the obtained results. We show the interrest of widely extracting annotation rules, even those of low confidence. We experiment segment patterns, that give interresting performances for deeply structured data. More generaly, we give results relative to performances of the system from diverse points of view and in diverse configurations. They show that the proposed approach is competitive and that it opens up perspectives for natural language observation and automatic annotation using data mining.
Complete list of metadatas

Cited literature [55 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00788630
Contributor : Damien Nouvel <>
Submitted on : Thursday, February 14, 2013 - 5:49:20 PM
Last modification on : Tuesday, July 2, 2019 - 4:02:04 PM
Long-term archiving on : Sunday, April 2, 2017 - 12:04:15 AM

Identifiers

  • HAL Id : tel-00788630, version 1

Collections

Citation

Damien Nouvel. Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale. Apprentissage [cs.LG]. Université François Rabelais - Tours, 2012. Français. ⟨tel-00788630⟩

Share

Metrics

Record views

1111

Files downloads

2301