Skip to Main content Skip to Navigation

Learning Image Classification and Retrieval Models

Thomas Mensink 1, 2
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : We are currently experiencing an exceptional growth of visual data, for example, millions of photos are shared daily on social-networks. Image understanding methods aim to facilitate access to this visual data in a semantically meaningful manner. In this dissertation, we define several detailed goals which are of interest for the image understanding tasks of image classification and retrieval, which we address in three main chapters. First, we aim to exploit the multi-modal nature of many databases, wherein documents consists of images with a form of textual description. In order to do so we define similarities between the visual content of one document and the textual description of another document. These similarities are computed in two steps, first we find the visually similar neighbors in the multi-modal database, and then use the textual descriptions of these neighbors to define a similarity to the textual description of any document. Second, we introduce a series of structured image classification models, which explicitly encode pairwise label interactions. These models are more expressive than independent label predictors, and lead to more accurate predictions. Especially in an interactive prediction scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. We explore structured models for multi-label image classification, for attribute-based image classification, and for optimizing for specific ranking measures. Finally, we explore k-nearest neighbors and nearest-class mean classifiers for large-scale image classification. We propose efficient metric learning methods to improve classification performance, and use these methods to learn on a data set of more than one million training images from one thousand classes. Since both classification methods allow for the incorporation of classes not seen during training at near-zero cost, we study their generalization performances. We show that the nearest-class mean classification method can generalize from one thousand to ten thousand classes at negligible cost, and still perform competitively with the state-of-the-art.
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download
Contributor : Thoth Team Connect in order to contact the contributor
Submitted on : Wednesday, November 14, 2012 - 4:27:05 PM
Last modification on : Tuesday, October 19, 2021 - 11:13:04 PM
Long-term archiving on: : Saturday, December 17, 2016 - 10:26:54 AM


  • HAL Id : tel-00752022, version 1




Thomas Mensink. Learning Image Classification and Retrieval Models. Computer Vision and Pattern Recognition [cs.CV]. Université de Grenoble, 2012. English. ⟨tel-00752022v1⟩



Record views


Files downloads