Of Learning Visual Representations Robust to Invariances for Image Classification and Retrieval

Mattis Paulin 1, 2
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : This dissertation focuses on designing image recognition systems which are robust to geometric variability. Image understanding is a difficult problem, as images are two-dimensional projections of 3D objects, and representations that must fall into the same category, for instance objects of the same class in classification can display significant differences. Our goal is to make systems robust to the right amount of deformations, this amount being automatically determined from data. Our contributions are twofolds. We show how to use virtual examples to enforce robustness in image classification systems and we propose a framework to learn robust low-level descriptors for image retrieval. We first focus on virtual examples, as transformation of real ones. One image generates a set of descriptors –one for each transformation– and we show that data augmentation, ie considering them all as iid samples, is the best performing method to use them, provided a voting stage with the transformed descriptors is conducted at test time. Because transformations have various levels of information, can be redundant, and can even be harmful to performance, we propose a new algorithm able to select a set of transformations, while maximizing classification accuracy. We show that a small amount of transformations is enough to considerably improve performance for this task. We also show how virtual examples can replace real ones for a reduced annotation cost. We report good performance on standard fine-grained classification datasets. In a second part, we aim at improving the local region descriptors used in image retrieval and in particular to propose an alternative to the popular SIFT descriptor. We propose new convolutional descriptors, called patch-CKN, which are learned without supervision. We introduce a linked patch- and image-retrieval dataset based on structure from motion of web-crawled images, and design a method to accurately test the performance of local descriptors at patch and image levels. Our approach outperforms both SIFT and all tested approaches with convolutional architectures on our patch and image benchmarks, as well as several styate-of-theart datasets.
Document type :
Liste complète des métadonnées

Cited literature [133 references]  Display  Hide  Download

Contributor : Abes Star <>
Submitted on : Thursday, January 11, 2018 - 2:47:34 PM
Last modification on : Saturday, October 6, 2018 - 1:17:19 AM
Document(s) archivé(s) le : Tuesday, August 28, 2018 - 11:43:05 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01677852, version 3



Mattis Paulin. Of Learning Visual Representations Robust to Invariances for Image Classification and Retrieval. Artificial Intelligence [cs.AI]. Université Grenoble Alpes, 2017. English. ⟨NNT : 2017GREAM007⟩. ⟨tel-01677852v3⟩



Record views


Files downloads