Machine Learning for Image Based Motion Capture

Ankur Agarwal

Theses Year : 2006

Machine Learning for Image Based Motion Capture

(1)

Ankur Agarwal

Function : Author

Learning and recognition in vision

Abstract

Image based motion capture is a problem that has recently gained a lot of attention in the domain of understanding human motion in computer vision. The problem involves estimating the 3D configurations of a human body from a set of images and has applications that include human computer interaction, smart surveillance, video analysis and animation. This thesis takes a machine learning based approach to reconstructing 3D pose and motion from monocular images or video. It makes use of a collection of images and motion capture data to derive mathematical models that allow the recovery of full body configurations directly from image features. The approach is completely data-driven and avoids the use of a human body model. This makes the inference extremely fast. We formulate a class of regression based methods to distill a large training database of motion capture and image data into a compact model that generalizes to predicting pose from new images. The methods rely on using appropriately developed robust image descriptors, learning dynamical models of human motion, and kernelizing the input within a sparse regression framework. Firstly, it is shown how pose can effectively and efficiently be recovered from image silhouettes that are extracted using background subtraction. We exploit sparseness properties of the relevance vector machine for improved generalization and efficiency, and make use of a mixture of regressors for probabilistically handling ambiguities that are present in monocular silhouette based 3D reconstruction. The methods developed enable pose reconstruction from single images as well as tracking motion in video sequences. Secondly, the framework is extended to recover 3D pose from cluttered images by introducing a suitable image encoding that is resistant to changes in background. We show that non-negative matrix factorization can be used to suppress background features and allow the regression to selectively cue on features from the foreground human body. Finally, we study image encoding methods in a broader context and present a novel multi-level image encoding framework called ‘hyperfeatures' that proves to be effective for object recognition and image classification tasks.

Keywords

computer vision human motion analysis machine learning

Domains

Human-Computer Interaction [cs.HC]

Fichier principal

Agarwal-thesis.pdf (4.71 Mo)

William Triggs : Connect in order to contact the contributor

https://theses.hal.science/tel-00390301

Submitted on : Monday, June 1, 2009-5:39:52 PM

Last modification on : Thursday, April 4, 2024-9:28:38 PM

Long-term archiving on: Monday, October 15, 2012-11:35:33 AM

Dates and versions

tel-00390301 , version 1 (01-06-2009)

Identifiers

HAL Id : tel-00390301 , version 1

Cite

Ankur Agarwal. Machine Learning for Image Based Motion Capture. Human-Computer Interaction [cs.HC]. Institut National Polytechnique de Grenoble - INPG, 2006. English. ⟨NNT : ⟩. ⟨tel-00390301⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA IMAG CNRS INRIA INRIA2

1852 View

2305 Download

Machine Learning for Image Based Motion Capture

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share