Skip to Main content Skip to Navigation

Outils d'exploration de corpus et désambiguïsation lexicale automatique

Abstract : This thesis deals with automatic word sense disambiguation using supervised learning methods. In the first part, we present a set of powerful tools for processing tagged linguistic corpora. To produce these tools, we developed a C++ library that implements an expressive and elaborate corpus-query language, based on meta-regular expressions. In the second part, we compare various supervised learning algorithms. We then use them to perform a systematic and in-depth study of various disambiguation criteria based on word co-occurrence, and more generally on n-gram co-occurrence. Our results are not always in line with some practices in the field. For example, we show that omitting grammatical words decreases performance and that bigrams yield better results than unigrams.
Document type :
Complete list of metadata
Contributor : Laurent Audibert Connect in order to contact the contributor
Submitted on : Wednesday, February 4, 2004 - 2:43:29 PM
Last modification on : Tuesday, February 2, 2021 - 3:10:58 AM
Long-term archiving on: : Wednesday, September 12, 2012 - 1:10:10 PM


  • HAL Id : tel-00004475, version 1



Laurent Audibert. Outils d'exploration de corpus et désambiguïsation lexicale automatique. Autre [cs.OH]. Université de Provence - Aix-Marseille I, 2003. Français. ⟨tel-00004475⟩



Record views


Files downloads