Skip to Main content Skip to Navigation
Habilitation à diriger des recherches

Des mots aux textes. Analyse sémantique pour l'accès à l'information

Abstract : Why is it so difficult to automatically understand a language even when what is targeted is only a limited kind of understanding, based on known facts? A key reason is the great variability in language, which is too challenging for a computer. This is the problem I try to tackle: how to identify similar meanings among different expressions? How to identify fragments of meaning in a sea of texts? This thesis consists of four chapters. I first consider recent developments in computational linguistics: I show that the availability of large corpora has resulted in more functional Natural Language Processing (NLP). This evolution carries the potential of a major impact on theory: corpora and automatic acquisition of knowledge from corpora (especially using machine learning techniques) makes it possible to get semantics based on language use. Each of the next three chapters deals with a different level of analysis (lexical semantics for semantic annotation, predicative semantics for relation extraction, and text semantics for technical document modelling). I suggest the idea of a continuum between these levels, since they all share fundamental similarities that affect the techniques used. I emphasize, in the conclusion, the similarities between these three different levels: the complex problem of the relations between words and concepts, the fuzziness of linguistic categories, the great variability of language. I conclude with a discussion on the relationship between NLP and linguistics, before proposing future research through alternative routes.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00436064
Contributor : Thierry Poibeau <>
Submitted on : Wednesday, November 25, 2009 - 10:41:18 PM
Last modification on : Monday, October 19, 2020 - 11:05:55 AM
Long-term archiving on: : Thursday, June 17, 2010 - 10:09:14 PM

File

Identifiers

  • HAL Id : tel-00436064, version 1

Collections

Citation

Thierry Poibeau. Des mots aux textes. Analyse sémantique pour l'accès à l'information. Interface homme-machine [cs.HC]. Université Paris-Nord - Paris XIII, 2008. ⟨tel-00436064⟩

Share

Metrics

Record views

730

Files downloads

7713