Skip to Main content Skip to Navigation

Fusion de données audio-visuelles pour l'interaction Homme-Robot

Abstract : In the framework of assistance robotics, this PHD aims at merging two channels of information (visual and auditive) potentially available on a robot. The goal is to complete and/or confirm data that an only channel could have supplied in order to perform advanced interaction between a human and a robot. To do so, we propose a perceptual interface for multimodal interaction which goal is to interpret jointly speech and gesture, in particular for the use of spatial references. In this thesis, we first describe the speech part of this work which consists in an embedded recognition and interpretation system for continuous speech. Then comes the vision part which is composed of a visual multi-target tracker that tracks, in 3D, the head and the two hands of a human in front of the robot, and a second tracker for the head orientation. The outputs of these trackers are used to feed the gesture recognition system described later. We continue with the description of a module dedicated to the fusion of the data outputs of these information sources in a probabilistic framework. Last, we demonstrate the interest and feasibility of such a multimodal interface through some demonstrations on the LAAS-CNRS robots. All the modules described in this thesis are working in quasi-real time on these real robotic platforms.
Document type :
Complete list of metadatas

Cited literature [195 references]  Display  Hide  Download
Contributor : Arlette Evrard <>
Submitted on : Wednesday, June 23, 2010 - 9:59:11 AM
Last modification on : Friday, October 23, 2020 - 4:45:24 PM
Long-term archiving on: : Monday, October 22, 2012 - 2:35:44 PM


  • HAL Id : tel-00494382, version 1


Brice Burger. Fusion de données audio-visuelles pour l'interaction Homme-Robot. Automatique / Robotique. Université Paul Sabatier - Toulouse III, 2010. Français. ⟨tel-00494382⟩



Record views


Files downloads