Identification automatique d'entités pour l'enrichissement de contenus textuels

Rosa Stern 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : This dissertation proposes a method and a system for the identification of entities (persons, locations, organizations) mentionned in the textual production of the news agency Agence France Presse, in the prospect of the automatic content enrichment. The various fields concerned by this task are viewed through their relationship: Semantic Web, Information Extraction and in particular Named Entity Recognition (\ner), Semantic Annotation, Entity Linking. Following this study, the industrial need expressed by the Agence France Presse is the subject of specifications, useful for the development of a solution relying on Natural Language Processing tools. The approach adopted for the identification of the target entities is then described: we propose a system taking charge of the \ner step using any existing module, whose results, possibly combined with those of other modules, are evaluated by a linking module able to (i) align a given mention with the entity it denotes among an inventory, built prior to the task, (ii) to spot denotations without alignment in the inventory and (iii) to reconsider denotational readings of mentions (false positive detection). The \nomos system is developed to this end for the processing of French data. Its conception also gives rise to the building and use of resources integrated into the \ld network, as well as a rich knowledge base about the target entities.
Document type :
Theses
Complete list of metadatas

Cited literature [165 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00939420
Contributor : Rosa Stern <>
Submitted on : Thursday, January 30, 2014 - 4:46:02 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM

Identifiers

  • HAL Id : tel-00939420, version 1

Collections

Citation

Rosa Stern. Identification automatique d'entités pour l'enrichissement de contenus textuels. Informatique et langage [cs.CL]. Université Paris-Diderot - Paris VII, 2013. Français. ⟨tel-00939420⟩

Share

Metrics

Record views

498

Files downloads

2698