Un modèle de recherche d'information basé sur les graphes et les similarités structurelles pour l'amélioration du processus de recherche d'information

Abstract : The main objective of IR systems is to select relevant documents, related to a user's information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which uses a special kind of similarity, namely structural similarities (similarities that use both surface attributes and relation between attributes). These similarities were inspired from cognitive studies and a general similarity measure based on node comparison in a bipartite graph. We propose an adaptation of this general method to the special context of information retrieval. Adaptation consists in taking into account the domain specificities: data type, weighted edges, normalization choice. The core problem is how documents are compared against queries. The idea we develop is that similar documents will share similar terms and similar terms will appear in similar documents. We have developed an algorithm which traduces this idea. Then we have study problem related to convergence and complexity, then we have produce some test on classical collection and compare our measure with two others that are references in our domain. The Report is structured in five chapters: First chapter deals with comparison problem, and related concept like similarities, we explain different point of view and propose an analogy between cognitive similarity model and IR model. In the second chapter we present the IR task, test collection and measures used to evaluate a relevant document list. The third chapter introduces graph definition: our model is based on graph bipartite representation, so we define graphs and criterions used to evaluate them. The fourth chapter describe how we have adopted, and adapted the general comparison method. The Fifth chapter describes how we evaluate the ordering performance of our method, and also how we have compared our method with two others.
Complete list of metadatas

Cited literature [223 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00446372
Contributor : Yaël Champclaux <>
Submitted on : Tuesday, January 12, 2010 - 3:50:37 PM
Last modification on : Thursday, June 27, 2019 - 4:27:42 PM
Long-term archiving on : Wednesday, November 30, 2016 - 10:32:13 AM

Identifiers

  • HAL Id : tel-00446372, version 1

Collections

Citation

Yaël Champclaux. Un modèle de recherche d'information basé sur les graphes et les similarités structurelles pour l'amélioration du processus de recherche d'information. Informatique [cs]. Université Paul Sabatier - Toulouse III, 2009. Français. ⟨tel-00446372⟩

Share

Metrics

Record views

741

Files downloads

5404