Skip to Main content Skip to Navigation
Theses

Modélisation d'un système de recherche d'information pour les systèmes hypertextes. Application à la recherche d'information sur le World Wide Web

Abstract : In a hypertext documents are seldom composed of a set of nodes instead of a single one. The information one page conveys might not be fully grasped if only the content of it is considered. The content of the pages with which the page being considered compose one document bear contextual information. Taking into account contextual information when indexing pages is fundamental to the quality of their index. Information retrieval systems for the Web, commonly known as Web search engines, should consider the splitting up of Web documents into several pages: one page should not be considered as a fully-fledged document, it is only a part of it. Therefore, when indexing a page one should consider its contextual information which is seldom located in its neighborhood. Traditionally, Web search engines consider pages as fully-fledged documents and their index are then built only from their contents. Contextual information is not considered. In this work we put forward a new information retrieval model for search engines running over Web sites. The cornerstone of it is a 2-level index for the pages composing the site: the bottom level is constructed solely from the content of the page itself, and the top level is constructed from the analysis of the contents of the pages which give a context to the page being indexed. We aim to improve the effectiveness of the search engine by improving the quality of the pages' index. The implementation of a search engine prototype integrating the model suggested and the use of the test collection WT10g issued from the TREC conferences and adapted to our needs, allowed us to carry out a large number of tests. The results of these tests showed an improvement of the effectiveness of the search engine prototype when compared with that of a search engine integrating a traditional model where contextual information is not used to index pages. Therefore, the tests unveiled evidence that contextual information might be worth considering when modelling a search engine.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00818333
Contributor : Florent Breuil <>
Submitted on : Friday, April 26, 2013 - 3:56:33 PM
Last modification on : Wednesday, June 24, 2020 - 4:18:07 PM
Long-term archiving on: : Saturday, July 27, 2013 - 4:40:16 AM

Identifiers

  • HAL Id : tel-00818333, version 1

Citation

Fernando Jorge Carvalho de Aguiar. Modélisation d'un système de recherche d'information pour les systèmes hypertextes. Application à la recherche d'information sur le World Wide Web. Web. Ecole Nationale Supérieure des Mines de Saint-Etienne; Université Jean Monnet - Saint-Etienne, 2002. Français. ⟨NNT : 2001EMSE0026⟩. ⟨tel-00818333⟩

Share

Metrics

Record views

578

Files downloads

4345