Designing Deep Architectures for Visual Understanding

Abstract : Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for numerous labeled data to learn from. Since precise annotations are time-consuming to produce, we first rely on bigger datasets built with cheaper image-level labels. We design a global pooling function to work with them and to recover latent information about spatial localization of objects. We then deal with usual object-level annotations and introduce several new modules to learn part-based representations. By being more flexible than standard bounding boxes and exploiting latent object structure, they yield finer descriptions. We address the issue of efficiency in end-to-end learning both of these latent representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially when these are difficult to obtain. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework. All models are thoroughly experimentally evaluated on standard datasets and achieve competitive results with the literature.
Complete list of metadatas

Cited literature [189 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02057434
Contributor : Taylor Mordan <>
Submitted on : Tuesday, March 5, 2019 - 12:08:49 PM
Last modification on : Friday, July 5, 2019 - 3:26:03 PM
Long-term archiving on : Thursday, June 6, 2019 - 2:11:26 PM

File

Taylor_MORDAN_PhD_thesis.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02057434, version 1

Citation

Taylor Mordan. Designing Deep Architectures for Visual Understanding. Computer Vision and Pattern Recognition [cs.CV]. EDITE, 2018. English. ⟨tel-02057434⟩

Share

Metrics

Record views

106

Files downloads

86