Skip to Main content Skip to Navigation
Theses

Vers des moteurs de recherche "intelligents" : un outil de détection automatique de thèmes. Méthode basée sur l'identification automatique des chaînes de référence

Laurence Longo 1
1 Fonctionnements Discursifs & Traduction
LILPA - Linguistique, Langues et Parole
Abstract : This thesis in the field of Natural Language Processing aims at optimizing documents classification in search engines. This work focuses on the development of a tool that automatically detects documents topics (ATDS-fr). Using poor knowledge, the hybrid method combines statistical techniques for topic segmentation and linguistic methods that identify cohesive markers. Among them, reference chains - sequences of referential expressions referring to the same entity (e.g. Paul ... he ... this man) - have been given special attention as they are important topic markers (i.e. they are markers of topic introduction, maintenance and change). Thus, from a study of reference chains extracted from a corpus composed of various textual genres (newspapers, public reports, European laws, editorials and novel) we developed RefGen, an automatic reference chains identification module, which was evaluated according to current coreference metrics.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00939243
Contributor : Laurence Longo <>
Submitted on : Thursday, January 30, 2014 - 2:39:05 PM
Last modification on : Monday, January 20, 2020 - 3:26:02 PM
Long-term archiving on: : Sunday, April 9, 2017 - 2:54:05 AM

Identifiers

  • HAL Id : tel-00939243, version 1

Collections

Citation

Laurence Longo. Vers des moteurs de recherche "intelligents" : un outil de détection automatique de thèmes. Méthode basée sur l'identification automatique des chaînes de référence. Linguistique. Université de Strasbourg, 2013. Français. ⟨tel-00939243⟩

Share

Metrics

Record views

864

Files downloads

3123