Outils d'exploration de corpus et désambiguïsation lexicale automatique

Abstract : This thesis deals with automatic word sense disambiguation using supervised learning methods. In the first part, we present a set of powerful tools for processing tagged linguistic corpora. To produce these tools, we developed a C++ library that implements an expressive and elaborate corpus-query language, based on meta-regular expressions. In the second part, we compare various supervised learning algorithms. We then use them to perform a systematic and in-depth study of various disambiguation criteria based on word co-occurrence, and more generally on n-gram co-occurrence. Our results are not always in line with some practices in the field. For example, we show that omitting grammatical words decreases performance and that bigrams yield better results than unigrams.
Document type :
Theses
Autre [cs.OH]. Université de Provence - Aix-Marseille I, 2003. Français


https://tel.archives-ouvertes.fr/tel-00004475
Contributor : Laurent Audibert <>
Submitted on : Wednesday, February 4, 2004 - 2:43:29 PM
Last modification on : Wednesday, February 4, 2004 - 2:43:29 PM
Document(s) archivé(s) le : Wednesday, September 12, 2012 - 1:10:10 PM

Identifiers

  • HAL Id : tel-00004475, version 1

Collections

Citation

Laurent Audibert. Outils d'exploration de corpus et désambiguïsation lexicale automatique. Autre [cs.OH]. Université de Provence - Aix-Marseille I, 2003. Français. <tel-00004475>

Export

Share

Metrics

Record views

203

Document downloads

1113