Acquisition sur corpus d'informations lexicales fondées sur la sémantique différentielle

Mathias Rossignol 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Semantic lexicons are an essential resource to let many natural language processing applications (automatic summarization, information retrieval, automatic translation, etc.) penetrate the meaning of a text. The relevance of the information gathered by those lexicons raises a problematic question: the meaning of a word like soap, for example, varies considerably whether it is considered in a sanitary or televisual context. A linguistically motivated and cost-effective way of building semantic lexicons precisely adapted to a certain domain of expression consists in “learning” word meanings from their actual usage as observed in a representative collection of texts, or corpus. To answer this challenge, we propose in this document a three-stage methodology for the automatic acquisition of lexical semantic information from texts, based on the linguistic principles of F. Rastier's Interpretative semantics. Thanks to a statistical analysis of word uses, employing both classical and novel methods, we first manage to bring together words belonging to a same domain (for example data, transfer, network for IT), then to build classes of words having a similar meaning (data and information). We finally propose a first method to put to light fine-grained meaning distinctions between close words (data is more “concrete” than information), thus reaching a level of meaning refinement never before attained, to our knowledge, by automatic means.
Document type :
Theses
Complete list of metadatas

Cited literature [105 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00524299
Contributor : Patrick Gros <>
Submitted on : Thursday, October 7, 2010 - 2:33:02 PM
Last modification on : Friday, November 16, 2018 - 1:24:05 AM
Long-term archiving on : Monday, January 10, 2011 - 11:28:45 AM

Identifiers

  • HAL Id : tel-00524299, version 1

Citation

Mathias Rossignol. Acquisition sur corpus d'informations lexicales fondées sur la sémantique différentielle. Interface homme-machine [cs.HC]. Université Rennes 1, 2005. Français. ⟨tel-00524299⟩

Share

Metrics

Record views

492

Files downloads

942