Skip to Main content Skip to Navigation

Learning with Limited Annotated Data for Visual Understanding

Mikita Dvornik 1, 2
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : The ability of deep-learning methods to excel in computer vision highly depends on the amount of annotated data available for training. For some tasks, annotation may be too costly and labor intensive, thus becoming the main obstacle to better accuracy. Algorithms that learn from data automatically, without human supervision, perform substantially worse than their fully-supervised counterparts. Thus, there is a strong motivation to work on effective methods for learning with limited annotations. This thesis proposes to exploit prior knowledge about the task and develops more effective solutions for scene understanding and few-shot image classification.Main challenges of scene understanding include object detection, semantic and instance segmentation. Similarly, all these tasks aim at recognizing and localizing objects, at region- or more precise pixel-level, which makes the annotation process difficult. The first contribution of this manuscript is a Convolutional Neural Network (CNN) that performs both object detection and semantic segmentation. We design a specialized network architecture, that is trained to solve both problems in one forward pass, and operates in real-time. Thanks to the multi-task training procedure, both tasks benefit from each other in terms of accuracy, with no extra labeled data.The second contribution introduces a new technique for data augmentation, i.e., artificially increasing the amount of training data. It aims at creating new scenes by copy-pasting objects from one image to another, within a given dataset. Placing an object in a right context was found to be crucial in order to improve scene understanding performance. We propose to model visual context explicitly using a CNN that discovers correlations between object categories and their typical neighborhood, and then proposes realistic locations for augmentation. Overall, pasting objects in ``right'' locations allows to improve object detection and segmentation performance, with higher gains in limited annotation scenarios.For some problems, the data is extremely scarce, and an algorithm has to learn new concepts from a handful of examples. Few-shot classification consists of learning a predictive model that is able to effectively adapt to a new class, given only a few annotated samples. While most current methods concentrate on the adaptation mechanism, few works have tackled the problem of scarce training data explicitly. In our third contribution, we show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform more sophisticated existing techniques. Our approach consists of designing an ensemble of deep networks to leverage the variance of the classifiers, and introducing new strategies to encourage the networks to cooperate, while encouraging prediction diversity. By matching different networks outputs on similar input images, we improve model accuracy and robustness, comparing to classical ensemble training. Moreover, a single network obtained by distillation shows similar to the full ensemble performance and yields state-of-the-art results with no computational overhead at test time.
Complete list of metadatas

Cited literature [190 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, April 1, 2020 - 10:03:10 AM
Last modification on : Wednesday, December 16, 2020 - 3:10:40 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02527279, version 1



Mikita Dvornik. Learning with Limited Annotated Data for Visual Understanding. Computer Vision and Pattern Recognition [cs.CV]. Université Grenoble Alpes, 2019. English. ⟨NNT : 2019GREAM050⟩. ⟨tel-02527279⟩



Record views


Files downloads