Skip to Main content Skip to Navigation

Une approche linguistique de l'évaluation des ressources extraites par analyse distributionnelle automatique

Abstract : In this thesis, we address the question of the evaluation of distributional thesauri from a linguistic point of view. The most current ways to evaluate distributional methods rely on the comparison with gold standards like WordNet or semantic tasks like the TOEFL test. However, these evaluation methods are quantitative and thus restrict the possibility of performing a linguistic analysis of the distributional neighbours. Our work aims at a better understanding of the distributional behaviors of words in texts through the study of distributional thesauri. First, we take a quantitative approach based on a comparison of several distributional thesauri with gold standards (the DES - a dictionary of synonyms - and JeuxDeMots - a crowdsourced lexical network). This step allowed us to have an overview of the nature of the semantic relations extracted in our distributional thesauri. In a second step, we relied on this comparison to select samples of distributional neighbours for a qualitative study. We focused on "classical" semantic relations, e.g. synonymy, antonymy, hypernymy and meronymy. We considered several protocols to compare the properties of the couples of distributional neighbours which were found in the gold standards and the others. Thus, taking into account parameters like the nature of the corpora from which were generated our distributional thesauri, we explain why some synonyms, hypernyms, etc. can be substituted in texts while others cannot. The purpose of this work is twofold. First, it questions the traditional evaluation methods, then it shows how distributional thesauri can be used for the study of semantic relations.
Document type :
Complete list of metadatas

Cited literature [222 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, January 28, 2014 - 11:22:07 PM
Last modification on : Wednesday, October 14, 2020 - 3:44:04 AM
Long-term archiving on: : Sunday, April 9, 2017 - 1:58:21 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00937926, version 1


François Morlane-Hondère. Une approche linguistique de l'évaluation des ressources extraites par analyse distributionnelle automatique. Linguistique. Université Toulouse le Mirail - Toulouse II, 2013. Français. ⟨NNT : 2013TOU20040⟩. ⟨tel-00937926⟩



Record views


Files downloads