Acquisition automatique de lexiques sémantiques pour la recherche d'information

Vincent Claveau 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Many applications in the field of Natural Language Processing (information retrieval, machine translation, etc.) need semantic resources that are specific to their tasks and domains. To satisfy this need we have developed ASARES, a corpus-based lexical semantic acquisition system. It fulfills three objectives: it has good extraction results; these results and the whole acquisition process are interpretable; and it is generic and automatic enough to be easily portable from a corpus to another. To achieve these goals, ASARES uses a machine learning method ---inductive logic programming--- which makes possible to infer part-of-speech and semantic patterns from examples of the semantic elements we want to acquire. These patterns are then used to extract new elements from the corpus. We also show that it is possible to combine this symbolic method with statistical acquisition methods to make ASARES more automatic. To validate our system, we have used it to acquire a kind of semantic relations between nouns and verbs defined in the Generative Lexicon and called qualia relations. This task has two main interests. On one hand, these relations are defined only in a theoretical point of view; the linguistic interpretation of the patterns thus allows to have a deeper understanding of their contextual realizations. On the other hand, several authors have noticed that such relations can be useful in information retrieval tasks because they make semantically equivalent reformulations of ideas accessible. With the help of a query expansion experiment using qualia relations extracted with ASARES, we show that this assumption is true to a certain extend: the performances of an information retrieval system are significantly improved though localized.
