Skip to Main content Skip to Navigation

Deep Regression Models and Computer Vision Applications for Multi-Person Human-Robot Interaction

Stéphane Lathuilière 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology, LJK - Laboratoire Jean Kuntzmann
Abstract : In order to interact with humans, robots need to perform basic perception tasks such as face detection, human pose estimation or speech recognition. However, in order have a natural interaction with humans, the robot needs to model high level concepts such as speech turns, focus of attention or interactions between participants in a conversation. In this manuscript, we follow a top-down approach. On the one hand, we present two high-level methods that model collective human behaviors. We propose a model able to recognize activities that are performed by different groups of people jointly, such as queueing, talking. Our approach handles the general case where several group activities can occur simultaneously and in sequence. On the other hand, we introduce a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and adapt its gaze control strategy in the context of human-robot interaction. The robot is able to learn to focus its attention on groups of people from its own audio-visual experiences. Second, we study in detail deep learning approaches for regression problems. Regression problems are crucial in the context of human-robot interaction in order to obtain reliable information about head and body poses or the age of the persons facing the robot. Consequently, these contributions are really general and can be applied in many different contexts. First, we propose to couple a Gaussian mixture of linear inverse regressions with a convolutional neural network. Second, we introduce a Gaussian-uniform mixture model in order to make the training algorithm more robust to noisy annotations. Finally, we perform a large-scale study to measure the impact of several architecture choices and extract practical recommendations when using deep learning approaches in regression tasks. For each of these contributions, a strong experimental validation has been performed with real-time experiments on the NAO robot or on large and diverse data-sets.
Complete list of metadatas

Cited literature [193 references]  Display  Hide  Download
Contributor : Team Perception <>
Submitted on : Monday, May 28, 2018 - 5:00:36 PM
Last modification on : Thursday, November 19, 2020 - 1:02:22 PM
Long-term archiving on: : Wednesday, August 29, 2018 - 3:21:40 PM


Files produced by the author(s)


  • HAL Id : tel-01801807, version 1


Stéphane Lathuilière. Deep Regression Models and Computer Vision Applications for Multi-Person Human-Robot Interaction. Computer Vision and Pattern Recognition [cs.CV]. Université Grenoble - Alpes, 2018. English. ⟨tel-01801807v1⟩



Record views


Files downloads