Skip to Main content Skip to Navigation

Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion

Abstract : The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach.
Complete list of metadatas
Contributor : Abes Star :  Contact
Submitted on : Monday, March 12, 2018 - 4:55:07 PM
Last modification on : Thursday, March 26, 2020 - 8:10:43 PM
Document(s) archivé(s) le : Wednesday, June 13, 2018 - 2:36:24 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01729649, version 1



Liana Ermakova. Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion. Information Retrieval [cs.IR]. Université Toulouse le Mirail - Toulouse II, 2016. English. ⟨NNT : 2016TOU20023⟩. ⟨tel-01729649⟩



Record views


Files downloads