Deep Learning for Human Motion Analysis

Abstract : The research goal of this work is to develop learning methods advancing automatic analysis and interpreting of human motion from different perspectives and based on various sources of information, such as images, video, depth, mocap data, audio and inertial sensors. For this purpose, we propose a several deep neural models and associated training algorithms for supervised classification and semi-supervised feature learning, as well as modelling of temporal dependencies, and show their efficiency on a set of fundamental tasks, including detection, classification, parameter estimation and user verification. First, we present a method for human action and gesture spotting and classification based on multi-scale and multi-modal deep learning from visual signals (such as video, depth and mocap data). Key to our technique is a training strategy which exploits, first, careful initialization of individual modalities and, second, gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. Moving forward, from 1 to N mapping to continuous evaluation of gesture parameters, we address the problem of hand pose estimation and present a new method for regression on depth images, based on semi-supervised learning using convolutional deep neural networks, where raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. In separate but related work, we explore convolutional temporal models for human authentication based on their motion patterns. In this project, the data is captured by inertial sensors (such as accelerometers and gyroscopes) built in mobile devices. We propose an optimized shift-invariant dense convolutional mechanism and incorporate the discriminatively-trained dynamic features in a probabilistic generative framework taking into account temporal characteristics. Our results demonstrate, that human kinematics convey important information about user identity and can serve as a valuable component of multi-modal authentication systems.
Complete list of metadatas
Contributor : Natalia Neverova <>
Submitted on : Friday, February 17, 2017 - 2:36:48 PM
Last modification on : Wednesday, October 31, 2018 - 12:24:25 PM
Long-term archiving on : Thursday, May 18, 2017 - 2:31:36 PM


  • HAL Id : tel-01470466, version 1


Natalia Neverova. Deep Learning for Human Motion Analysis. Computer Vision and Pattern Recognition [cs.CV]. INSA Lyon, 2016. English. ⟨NNT : 2016LYSEI029⟩. ⟨tel-01470466v1⟩



Record views


Files downloads