Learning Image Classification and Retrieval Models

Abstract : We are currently experiencing an exceptional growth of visual data, for example, millions of photos are shared daily on social-networks. Image understanding methods aim to facilitate access to this visual data in a semantically meaningful manner. In this dissertation, we define several detailed goals which are of interest for the image understanding tasks of image classification and retrieval, which we address in three main chapters. First, we aim to exploit the multi-modal nature of many databases, wherein documents consists of images with a form of textual description. In order to do so we define similarities between the visual content of one document and the textual description of another document. These similarities are computed in two steps, first we find the visually similar neighbors in the multi-modal database, and then use the textual descriptions of these neighbors to define a similarity to the textual description of any document. Second, we introduce a series of structured image classification models, which explicitly encode pairwise label interactions. These models are more expressive than independent label predictors, and lead to more accurate predictions. Especially in an interactive prediction scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. We explore structured models for multi-label image classification, for attribute-based image classification, and for optimizing for specific ranking measures. Finally, we explore k-nearest neighbors and nearest-class mean classifiers for large-scale image classification. We propose efficient metric learning methods to improve classification performance, and use these methods to learn on a data set of more than one million training images from one thousand classes. Since both classification methods allow for the incorporation of classes not seen during training at near-zero cost, we study their generalization performances. We show that the nearest-class mean classification method can generalize from one thousand to ten thousand classes at negligible cost, and still perform competitively with the state-of-the-art.
Complete list of metadatas

Cited literature [189 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00752022
Contributor : Abes Star <>
Submitted on : Wednesday, June 28, 2017 - 9:49:29 AM
Last modification on : Thursday, June 21, 2018 - 3:42:09 PM
Long-term archiving on : Wednesday, January 17, 2018 - 10:30:19 PM

File

MENSINK_2012_diffusion.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00752022, version 2

Collections

Citation

Thomas Mensink. Learning Image Classification and Retrieval Models. Information Retrieval [cs.IR]. Université de Grenoble, 2012. English. ⟨NNT : 2012GRENM113⟩. ⟨tel-00752022v2⟩

Share

Metrics

Record views

1228

Files downloads

1626