Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion

Abstract : The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01729649
Contributor : Abes Star <>
Submitted on : Monday, March 12, 2018 - 4:55:07 PM
Last modification on : Monday, April 29, 2019 - 5:29:22 PM
Long-term archiving on : Wednesday, June 13, 2018 - 2:36:24 PM

File

Ermakova_Liana.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01729649, version 1

Collections

Citation

Liana Ermakova. Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion. Information Retrieval [cs.IR]. Université Toulouse le Mirail - Toulouse II, 2016. English. ⟨NNT : 2016TOU20023⟩. ⟨tel-01729649⟩

Share

Metrics

Record views

333

Files downloads

220