Skip to Main content Skip to Navigation

Apprentissage neuronal de caractéristiques spatio-temporelles pour la classification automatique de séquences vidéo

Abstract : This thesis focuses on the issue of automatic classification of video sequences. We aim, through this work, at standing out from the dominant methodology, which relies on so-called hand-crafted features, by proposing generic and problem-independent models. This can be done by automating the feature extraction process, which is performed in our case through a learning scheme from training examples, without any prior knowledge. To do so, we rely on existing neural-based methods, which are dedicated to object recognition in still images, and investigate their extension to the video case. More concretely, we introduce two learning-based models to extract spatio-temporal features for video classification: (i) A deep learning model, which is trained in a supervised way, and which can be considered as an extension of the popular ConvNets model to the video case, and (ii) An unsupervised learning model that relies on an auto-encoder scheme, and a sparse over-complete representation. Moreover, an additional contribution of this work lies in a comparative study between several sequence classification models. This study was performed using hand-crafted features especially designed to be optimal for the soccer action recognition problem. Obtained results have permitted to select the best classifier (a bidirectional long short-term memory recurrent neural network -BLSTM-) to be used for all experiments. In order to validate the genericity of the two proposed models, experiments were carried out on two different problems, namely human action recognition (using the KTH dataset) and facial expression recognition (using the GEMEP-FERA dataset). Obtained results show that our approaches achieve outstanding performances, among the best of the related works (with a recognition rate of 95,83% for the KTH dataset, and 87,57% for the GEMEP-FERA dataset).
Document type :
Complete list of metadatas

Cited literature [135 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Friday, January 17, 2014 - 2:52:10 PM
Last modification on : Wednesday, July 8, 2020 - 12:42:07 PM
Long-term archiving on: : Friday, April 18, 2014 - 4:41:08 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00871107, version 2


Moez Baccouche. Apprentissage neuronal de caractéristiques spatio-temporelles pour la classification automatique de séquences vidéo. Autre [cs.OH]. INSA de Lyon, 2013. Français. ⟨NNT : 2013ISAL0071⟩. ⟨tel-00871107v2⟩



Record views


Files downloads