Skip to Main content Skip to Navigation

Description de contenu vidéo : mouvements et élasticité temporelle

Abstract : Video recognition gain in performance during the last years, especially due to the improvement in the deep learning performances on images. However the jump in recognition rate on images does not directly impact the recognition rate on videos. This limitation is certainly due to this added dimension, the time, on which a robust description is still hard to extract. The recurrent neural networks introduce temporality but they have a limited memory. State of the art methods for video description usually handle time as a spatial dimension and the combination of video description methods reach the current best accuracies. However the temporal dimension has its own elasticity, different from the spatial dimensions. Indeed, the temporal dimension of a video can be locally deformed: a partial dilatation produces a visual slow down during the video, without changing the understanding, in contrast with a spatial dilatation on an image which will modify the proportions of the shown objects. We can thus expect to improve the video content classification by creating an invariant description to these speed changes. This thesis focus on the question of a robust video description considering the elasticity of the temporal dimension under three different angles. First, we have locally and explicitly described the motion content. Singularities are detected in the optical flow, then tracked along the time axis and organized in chain to describe video part. We have used this description on sport content. Then we have extracted global and implicit description thanks to tensor decompositions. Tensor enables to consider a video as a multi-dimensional data table. The extracted description are evaluated in a classification task. Finally, we have studied speed normalization method thanks to Dynamical Time Warping methods on series. We have showed that this normalization improve the classification rates.
Complete list of metadata

Cited literature [127 references]  Display  Hide  Download
Contributor : Abes Star :  Contact Connect in order to contact the contributor
Submitted on : Wednesday, February 6, 2019 - 6:28:34 PM
Last modification on : Wednesday, October 14, 2020 - 4:22:22 AM
Long-term archiving on: : Tuesday, May 7, 2019 - 3:13:00 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02010091, version 1



Katy Blanc. Description de contenu vidéo : mouvements et élasticité temporelle. Vision par ordinateur et reconnaissance de formes [cs.CV]. Université Côte d'Azur, 2018. Français. ⟨NNT : 2018AZUR4212⟩. ⟨tel-02010091⟩



Record views


Files downloads