Skip to Main content Skip to Navigation
Theses

Conception d'architectures profondes pour l'interprétation de données visuelles

Abstract : Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework.
Complete list of metadata

Cited literature [189 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02057434
Contributor : Abes Star :  Contact
Submitted on : Monday, June 15, 2020 - 3:09:24 PM
Last modification on : Tuesday, March 23, 2021 - 9:28:03 AM

File

2018SORUS270.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02057434, version 2

Citation

Taylor Mordan. Conception d'architectures profondes pour l'interprétation de données visuelles. Computer Vision and Pattern Recognition [cs.CV]. Sorbonne Université, 2018. English. ⟨NNT : 2018SORUS270⟩. ⟨tel-02057434v2⟩

Share

Metrics

Record views

134

Files downloads

83