Developing Audio-Visual capabilities of humanoid robot NAO

Jordi Sanchez-Riera 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : Humanoid robots are becoming more and more important in our daily lives due the high potential they have to help persons in different situations. To be able to aid, a human-robot interaction is essential and to this end, it is important to use as well as possible, the external information collected by the different sensors of the robot. Usually most relevant sensors for perception are cameras and micro- phones, which provide very rich information about the world. In this thesis, we plan to develop applications towards human-robot interaction and to achieve a more natural communication when interacting with the robot. Taking advantage of the information provided by the cameras and microphones of NAO humanoid robot, we present new algorithms and applications using these sensors. With the visual information we introduce two different stereo algorithms, that will serve as a basis to design other applications. The first stereo algorithm is designed to avoid problems with textureless regions using information from images in dif- ferent temporal instances. The second stereo algorithm, sceneflow, is designed to provide a more complete understanding of a scene, adding optical flow infor- mation in the computation of disparity. Indeed, position and velocity vector is available for each pixel. This provides a basis to start developing more high-level applications to a certain extent of interaction. Using the sceneflow algorithm, a descriptor is designed for action recognition. As a result, action recognition ben- efits from richer information in opposition to traditional monocular approaches, giving robustness to background clutter and disambiguating depth actions like 'punch'. To complement and improve the performance in action recognition, au- ditory information is added. It is well known that auditory data is complementary to the visual data and can be helpful in situations where objects are occluded or simply are not there. Finally, a last application developed towards a better human-robot interaction is a speaker detector. This can be used, for example, to center camera images to the speaking person (person of interest) and collect more reliable information. Here data from video and audio is also used, but the principle is completely different: from the visual and auditory features used to the way that these features are combined.
Complete list of metadatas
Contributor : Team Perception <>
Submitted on : Thursday, September 19, 2013 - 3:16:41 PM
Last modification on : Wednesday, April 11, 2018 - 1:58:43 AM
Long-term archiving on: Friday, December 20, 2013 - 3:09:28 PM


  • HAL Id : tel-00863461, version 1




Jordi Sanchez-Riera. Developing Audio-Visual capabilities of humanoid robot NAO. Robotics [cs.RO]. Université de Grenoble, 2013. English. ⟨tel-00863461⟩



Record views


Files downloads