Text-Based Ephemeral Clustering for Web Image Retrieval on Mobile Devices (version 1)

José G. Moreno 1 
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : In this thesis, we present a study about Web image results visualization on mobile devices. Our main findings were inspired by the recent advances in two main research areas - Information Retrieval and Natural Language Processing. In the former, we considered different topics such as search results clustering, Web mobile interfaces, query intent mining, to name but a few. In the latter, we were more focused in collocation measures, high order similarity metrics, etc. Particularly in order to validate our hypothesis, we performed a great deal of different experiments with task specific datasets. Many characteristics are evaluated in the proposed solutions. First, the clustering quality in which classical and recent evaluation metrics are considered. Secondly, the labeling quality of each cluster is evaluated to make sure that all possible query intents are covered. Thirdly and finally, we evaluate the user's effort in exploring the images in a gallery-based interface. An entire chapter is dedicated to each of these three aspects in which the datasets - some of them built to evaluate specific characteristics - are presented. For the final results, we can take into account two developed algorithms, two datasets and a SRC evaluation tool. From the algorithms, Dual $C$-means is our main product. It can be seen as a generalization of our previously developed algorithm, the $AGK$-means. Both are based in text-based similarity metrics. A new dataset for a complete evaluation of SRC algorithms is developed and presented. Similarly, a new Web image dataset is developed and used together with a new metric to measure the users effort when a set of Web images is explored. Finally, we developed an evaluation tool for the SRC problem, in which we have implemented several classical and recent SRC metrics. Our conclusions are drawn considering the numerous factors that were discussed in this thesis. However, additional studies could be motivated based in our findings. Some of them are discussed in the end of this study and preliminary analysis suggest that they are directions that have potential.
Submitted on : Tuesday, January 13, 2015 - 11:18:37 AM
Last modification on : Saturday, June 25, 2022 - 9:49:33 AM
Long-term archiving on: : Tuesday, April 14, 2015 - 10:41:58 AM


  • HAL Id : tel-01102604, version 1


José G. Moreno. Text-Based Ephemeral Clustering for Web Image Retrieval on Mobile Devices (version 1) . Computation and Language [cs.CL]. Université de Caen Basse-Normandie, 2014. English. ⟨tel-01102604⟩



