Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Supervised Learning Approaches for Automatic Structuring of Videos

Danila Potapov 1 
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : Automatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Inria Action Movies). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localisation.First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scorings annotations of the videos, along with a complete set of evaluation tools.Second, we introduce a new dataset, called Inria Action Movies, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as "pursuit" or "romance" in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots.Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014.
Complete list of metadata

Cited literature [124 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Friday, December 4, 2015 - 12:13:09 PM
Last modification on : Friday, March 25, 2022 - 9:43:22 AM
Long-term archiving on: : Saturday, April 29, 2017 - 4:54:24 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01238100, version 1



Danila Potapov. Supervised Learning Approaches for Automatic Structuring of Videos. Computer Vision and Pattern Recognition [cs.CV]. Université Grenoble Alpes, 2015. English. ⟨NNT : 2015GREAM023⟩. ⟨tel-01238100⟩



Record views


Files downloads