Apprentissage sur corpus de relations lexicales sémantiques - La linguistique et l'apprentissage au service d'applications du traitement automatique des langues

Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The document is a synthesis of our research on textual corpus based acquisition of lexical resources. More precisely, our work is dedicated to the elaboration of methods to automatically learn semantic lexical relations which enrich the description of words, both in a disambiguation and a intra- and inter-category semantic variants recognition objectives; moreover these relations can be used in several applications (information retrieval, filtering, etc.). A key point of our research is the strong coupling between the developed machine learning methods and linguistic theories: the theories are indeed a framework to determine relevant lexical relations, to validate what is acquired, or even to propose the machine learning method necessary for the acquisition; the acquired elements also have to be linguistically motivated and significant. We present our work in F. Rastier's Interpretive semantics framework to automatically acquire intra-category paradigmatic links (antonymy, synonymy, etc., but also more fine-grain semic ones) from specialized corpora, using statistical techniques (in particular ascending hierarchical clustering). We also describe how inductive logic programming symbolic machine learning helps us to learn noun-verb transcategory relations, controlling the relevance of the obtained links with J. Pustejovsky's Generative lexicon formalism. The conclusive section is dedicated to several issues, among which the use of the relations to expand queries in information retrieval systems, the way to evaluate the contribution of these lexical resources, but also the adequacy of explanatory machine learning techniques to acquire information from corpora.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

Cited literature [99 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00533657
Contributor : Patrick Gros <>
Submitted on : Monday, November 8, 2010 - 10:12:06 AM
Last modification on : Friday, November 16, 2018 - 1:23:56 AM
Long-term archiving on : Wednesday, February 9, 2011 - 2:51:15 AM

Identifiers

  • HAL Id : tel-00533657, version 1

Citation

Pascale Sébillot. Apprentissage sur corpus de relations lexicales sémantiques - La linguistique et l'apprentissage au service d'applications du traitement automatique des langues. Interface homme-machine [cs.HC]. Université Rennes 1, 2002. ⟨tel-00533657⟩

Share

Metrics

Record views

564

Files downloads

796