Enhanced image and video representation for visual recognition

Mihir Jain 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The subject of this thesis is about image and video representations for visual recognition. This thesis first focuses on image search, both for image and textual queries, and then considers the classification and the localization of actions in videos. In image retrieval, images similar to the query image are retrieved from a large dataset. On this front, we propose an asymmetric version of the Hamming Embedding method, where the comparison of query and database descriptors relies on a vector-to-binary code comparison. For image classification, where the task is to identify if an image contains any instance of the queried category, we propose a novel approach based on a match kernel between images, more specifically based on Hamming Embedding similarity. We also present an effective variant of the SIFT descriptor, which leads to a better classification accuracy. Action classification is improved by several methods to better employ the motion inherent to videos. This is done by dominant motion compensation, and by introducing a novel descriptor based on kinematic features of the visual flow. The last contribution is devoted to action localization, whose objective is to determine where and when the action of interest appears in the video. A selective sampling strategy produces 2D+t sequences of bounding boxes, which drastically reduces the candidate locations. The method advantageously exploits a criterion that takes in account how motion related to actions deviates from the background motion. We thoroughly evaluated all the proposed methods on real world images and videos from challenging benchmarks. Our methods outperform the previously published related state of the art and remains competitive with the subsequently proposed methods.
Complete list of metadatas

Cited literature [170 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00996793
Contributor : Hervé Jégou <>
Submitted on : Tuesday, May 27, 2014 - 12:35:46 AM
Last modification on : Friday, November 16, 2018 - 1:23:52 AM
Long-term archiving on : Wednesday, August 27, 2014 - 10:45:29 AM

Identifiers

  • HAL Id : tel-00996793, version 1

Citation

Mihir Jain. Enhanced image and video representation for visual recognition. Computer Vision and Pattern Recognition [cs.CV]. Université Rennes 1, 2014. English. ⟨tel-00996793⟩

Share

Metrics

Record views

682

Files downloads

1074