Skip to Main content Skip to Navigation
Theses

Semantic representations of images and videos

Abstract : Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03356457
Contributor : Abes Star :  Contact
Submitted on : Tuesday, September 28, 2021 - 10:04:28 AM
Last modification on : Tuesday, November 16, 2021 - 5:13:18 AM

File

FRANCIS_Danny_2019.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03356457, version 1

Citation

Danny Francis. Semantic representations of images and videos. Artificial Intelligence [cs.AI]. Sorbonne Université, 2019. English. ⟨NNT : 2019SORUS605⟩. ⟨tel-03356457⟩

Share

Metrics

Record views

70

Files downloads

43