Skip to Main content Skip to Navigation

Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels

Valentin Vielzeuf 1
1 Equipe Image - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain tasks, it is therefore relevant to use different modalities, such as sound or image.This thesis focuses on this notion in the context of deep learning. For this, it seeks to answer a particular problem: how to merge the different modalities within a deep neural network?We first propose to study a problem of concrete application: the automatic recognition of emotion in audio-visual contents.This leads us to different considerations concerning the modeling of emotions and more particularly of facial expressions. We thus propose an analysis of representations of facial expression learned by a deep neural network.In addition, we observe that each multimodal problem appears to require the use of a different merge strategy.This is why we propose and validate two methods to automatically obtain an efficient fusion neural architecture for a given multimodal problem, the first one being based on a central fusion network and aimed at preserving an easy interpretation of the adopted fusion strategy. While the second adapts a method of neural architecture search in the case of multimodal fusion, exploring a greater number of strategies and therefore achieving better performance.Finally, we are interested in a multimodal view of knowledge transfer. Indeed, we detail a non-traditional method to transfer knowledge from several sources, i.e. from several pre-trained models. For that, a more general neural representation is obtained from a single model, which brings together the knowledge contained in the pre-trained models and leads to state-of-the-art performances on a variety of facial analysis tasks.
Document type :
Complete list of metadatas

Cited literature [357 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Monday, January 13, 2020 - 3:13:23 PM
Last modification on : Monday, February 10, 2020 - 3:48:58 PM
Long-term archiving on: : Tuesday, April 14, 2020 - 4:39:06 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02437035, version 1


Valentin Vielzeuf. Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels. Bio-informatique [q-bio.QM]. Normandie Université, 2019. Français. ⟨NNT : 2019NORMC229⟩. ⟨tel-02437035⟩



Record views


Files downloads