Skip to Main content Skip to Navigation

Self-supervised learning of deep visual representations

Mathilde Caron 1, 2 
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : Humans and many animals can see the world and understand it effortlessly which gives some hope that visual perception could be realized by computers and Artificial Intelligence. More importantly, living beings acquire such an understanding of the visual world autonomously, without the intervention of a superviser explicitly telling them what, where or who is to be seen. This suggests that visual perception can be achieved without too much explicit human supervision and simply by letting systems observe large amounts of visual inputs.In particular, this manuscript tackles the problem of self-supervised learning which consists in training deep neural network systems without using any human annotations. Typically, neural networks require large amounts of annotated data, which have limited their applications in fields where accessing these annotations is expensive or difficult. Moreover, manual annotations are biased towards a specific task and towards the annotator’s own biases, which can result in noisy and unreliable signals. Training systems without annotations could lead to better, more generic and robust representations. In this manuscript, we present different contributions to the fast-growing field of self-supervised visual representation learning.In particular, we will start by extending a promising category of self-supervised approaches, namely deep clustering, which trains deep networks while simultaneously mining groups of visually consistent images in a data collection. We then identify the limits of deep clustering methods such as their difficulty to scale to very large datasets or the fact that they are prone to trivial solutions. As a result, we propose improved self-supervised methods that outperform their supervised counterparts on several benchmarks and exhibit interesting properties. For example, the resulting self-supervised networks contain generic representations that transfer well to a different datasets and tasks. They also contain explicit information about the semantic segmentation of an image. Importantly, we also probe our self-supervised models in the wild, by training them on hundreds of millions of unlabeled images randomly selected from the Internet.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, May 23, 2022 - 8:32:00 AM
Last modification on : Wednesday, June 1, 2022 - 11:09:11 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03675254, version 1



Mathilde Caron. Self-supervised learning of deep visual representations. Artificial Intelligence [cs.AI]. Université Grenoble Alpes [2020-..], 2021. English. ⟨NNT : 2021GRALM066⟩. ⟨tel-03675254⟩



Record views


Files downloads