Vers une représentation du contexte thématique en Recherche d'Information

Abstract : When searching for information within knowledge bases or document collections,humans use an information retrieval system (IRS). So that it can retrieve documentscontaining relevant information, users have to provide the IRS with a representationof their information need. Nowadays, this representation of the information need iscomposed of a small set of keywords often referred to as the « query ». A few wordsmay however not be sufficient to accurately and effectively represent the complete cognitivestate of a human with respect to her initial information need. A query may notcontain sufficient information if the user is searching for some topic in which she is notconfident at all. Hence, without some kind of context, the IRS could simply miss somenuances or details that the user did not – or could not – provide in query.In this thesis, we explore and propose various statistic, automatic and unsupervisedmethods for representing the topical context of the query. More specifically, we aim toidentify the latent concepts of a query without involving the user in the process norrequiring explicit feedback. We experiment using and combining several general informationsources representing the main types of information we deal with on a dailybasis while browsing theWeb.We also leverage probabilistic topic models (such as LatentDirichlet Allocation) in a pseudo-relevance feedback setting. Besides, we proposea method allowing to jointly estimate the number of latent concepts of a query andthe set of pseudo-relevant feedback documents which is the most suitable to modelthese concepts. We evaluate our approaches using four main large TREC test collections.In the appendix of this thesis, we also propose an approach for contextualizingshort messages which leverages both information retrieval and automatic summarizationtechniques
Document type :
Theses
Complete list of metadatas

Cited literature [69 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00918877
Contributor : Abes Star <>
Submitted on : Friday, June 6, 2014 - 1:22:08 PM
Last modification on : Saturday, March 23, 2019 - 1:22:48 AM
Long-term archiving on : Saturday, September 6, 2014 - 12:00:31 PM

File

thesis_romain_deveaud.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00918877, version 2

Collections

Citation

Romain Deveaud. Vers une représentation du contexte thématique en Recherche d'Information. Autre [cs.OH]. Université d'Avignon, 2013. Français. ⟨NNT : 2013AVIG0198⟩. ⟨tel-00918877v2⟩

Share

Metrics

Record views

831

Files downloads

480