Skip to Main content Skip to Navigation
Theses

Analyse automatique de structures thématiques discursives - Application à la recherche d'information

Frédérik Bilhaut 1
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : This PhD thesis belongs to the Natural Language Processing (NLP) field, and relates to the automated, semantic analysis of discourse structure. More precisely, we address the issue of thematic analysis, which aims at studying the structure of texts with respect to the organisation of their informational content. This task is of particular importance for Information Retrieval, which constitutes the primary application of our work. The concept of " theme " being particularly complex but scarcely studied for itself in the information retrieval literature, the first part of our dissertation is devoted to a large bibliographical study about the notions of theme, topic, subject, and aboutness, within the linguistics, information science and NLP fields. We draw from this study a definition of the theme as a discursive, semantic and structured object.We propose several models and processes, devoted firstly to the semantic analysis of geographical documents, and secondly to the automatic analysis of temporal discourse frames in the sense of Michel Charolles. We generalise this work introducing the notions of composite topic and semantic axis. The last part is devoted to the LinguaStream platform, an integrated experimentation environment that we designed to ease the elaboration of operational linguistic models, and that lead us to propose some original methodological principles.
Document type :
Theses
Complete list of metadatas

Cited literature [80 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00258766
Contributor : Hal System <>
Submitted on : Monday, February 25, 2008 - 11:30:48 AM
Last modification on : Friday, October 23, 2020 - 4:37:16 PM
Long-term archiving on: : Friday, September 28, 2012 - 10:10:24 AM

Identifiers

  • HAL Id : tel-00258766, version 1

Citation

Frédérik Bilhaut. Analyse automatique de structures thématiques discursives - Application à la recherche d'information. Autre [cs.OH]. Université de Caen, 2006. Français. ⟨tel-00258766⟩

Share

Metrics

Record views

606

Files downloads

2195