Skip to Main content Skip to Navigation

Conjugate Mixture Models for the Modeling of Visual and Auditory Perception

Vasil Khalidov 1
1 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology, LJK - Laboratoire Jean Kuntzmann, Inria Grenoble - Rhône-Alpes
Abstract : In this thesis, the modelling of audio-visual perception with a head-like device is considered. The related problems, namely audio-visual calibration, audio-visual object detection, localization and tracking are addressed. A spatio-temporal approach to the head-like device calibration is proposed based on probabilistic multimodal trajectory matching. The formalism of conjugate mixture models is introduced along with a family of efficient optimization algorithms to perform multimodal clustering. One instance of this algorithm family, namely the conjugate expectation maximization (ConjEM) algorithm is further improved to gain attractive theoretical properties. The multimodal object detection and object number estimation methods are developed, their theoretical properties are discussed. Finally, the proposed multimodal clustering method is combined with the object detection and object number estimation strategies and known tracking techniques to perform multimodal multiobject tracking. The performance is demonstrated on simulated data and the database of realistic audio-visual scenarios (CAVA database).
keyword : No keyword
Document type :
Complete list of metadata
Contributor : Team Perception <>
Submitted on : Thursday, April 7, 2011 - 2:43:43 PM
Last modification on : Tuesday, February 9, 2021 - 3:20:36 PM
Long-term archiving on: : Thursday, November 8, 2012 - 3:40:33 PM


  • HAL Id : tel-00584080, version 1




Vasil Khalidov. Conjugate Mixture Models for the Modeling of Visual and Auditory Perception. Human-Computer Interaction [cs.HC]. Université Joseph-Fourier - Grenoble I, 2010. English. ⟨tel-00584080v1⟩



Record views


Files downloads