Outils d'exploration de corpus et désambiguïsation lexicale automatique

Abstract : This thesis deals with automatic word sense disambiguation using supervised learning methods. In the first part, we present a set of powerful tools for processing tagged linguistic corpora. To produce these tools, we developed a C++ library that implements an expressive and elaborate corpus-query language, based on meta-regular expressions. In the second part, we compare various supervised learning algorithms. We then use them to perform a systematic and in-depth study of various disambiguation criteria based on word co-occurrence, and more generally on n-gram co-occurrence. Our results are not always in line with some practices in the field. For example, we show that omitting grammatical words decreases performance and that bigrams yield better results than unigrams.
Document type :
Theses
Other. Université de Provence - Aix-Marseille I, 2003. French


https://tel.archives-ouvertes.fr/tel-00004475
Contributor : Laurent AUDIBERT <>
Submitted on : Wednesday, February 4, 2004 - 2:43:29 PM
Last modification on : Wednesday, February 4, 2004 - 2:43:29 PM

Identifiers

  • HAL Id : tel-00004475, version 1

Collections

Citation

Laurent AUDIBERT. Outils d'exploration de corpus et désambiguïsation lexicale automatique. Other. Université de Provence - Aix-Marseille I, 2003. French. <tel-00004475>

Export

Share

Metrics

Consultation de
la notice

134

Téléchargement du document

619