Classification d'images et localisation d'objets par des méthodes de type noyau de Fisher

Ramazan Gokberk Cinbis 1
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this dissertation, we propose models and methods targeting image understanding tasks. In particular, we focus on Fisher kernel based approaches for the image classification and object localization problems. We group our studies into the following three main chapters. First, we propose novel image descriptors based on non-i.i.d. image models. Our starting point is the observation that local image regions are implicitly assumed to be identically and independently distributed (i.i.d.) in the bag-of-words (BoW) model. We introduce non-i.i.d. models by treating the parameters of the BoW model as latent variables, which renders all local regions dependent. Using the Fisher kernel framework we encode an image by the gradient of the data log-likelihood with respect to model hyper-parameters. Our representation naturally involves discounting transformations, providing an explanation of why such transformations have proven successful. Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words. Second, we present an object detection system based on the high-dimensional Fisher vectors image representation. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. We show that re-weighting the local image features based on these masks improve object detection performance significantly. Third, we propose a weakly supervised object localization approach. Standard supervised training of object detectors requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning, which requires only binary class labels that indicate the absence/presence of object instances. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. We show that this procedure is particularly important when high-dimensional representations, such as the Fisher vectors, are used. Finally, in the appendix of the thesis, we present our work on person identification in uncontrolled TV videos. We show that cast-specific distance metrics can be learned without labeling any training examples by utilizing face pairs within tracks and across temporally-overlapping tracks. We show that the obtained metrics improve face-track identification, recognition and clustering performances.
Complete list of metadatas

Cited literature [295 references]  Display  Hide  Download
Contributor : Abes Star <>
Submitted on : Monday, October 6, 2014 - 11:39:29 AM
Last modification on : Thursday, December 20, 2018 - 1:26:35 AM
Long-term archiving on : Thursday, January 8, 2015 - 4:15:25 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01071581, version 1



Ramazan Gokberk Cinbis. Classification d'images et localisation d'objets par des méthodes de type noyau de Fisher. Autre [cs.OH]. Université de Grenoble, 2014. Français. ⟨NNT : 2014GRENM024⟩. ⟨tel-01071581⟩



Record views


Files downloads