Skip to Main content Skip to Navigation

Deep Regression Models and Computer Vision Applications for Multiperson Human-Robot Interaction

Abstract : In order to interact with humans, robots need to perform basic perception taskssuch as face detection, human pose estimation or speech recognition. However, in order have a natural interaction with humans, the robot needs to modelhigh level concepts such as speech turns, focus of attention or interactions between participants in a conversation. In this manuscript, we follow a top-downapproach. On the one hand, we present two high-level methods that model collective human behaviors. We propose a model able to recognize activities thatare performed by different groups of people jointly, such as queueing, talking.Our approach handles the general case where several group activities can occur simultaneously and in sequence. On the other hand, we introduce a novelneural network-based reinforcement learning approach for robot gaze control.Our approach enables a robot to learn and adapt its gaze control strategy inthe context of human-robot interaction. The robot is able to learn to focus itsattention on groups of people from its own audio-visual experiences.Second, we study in detail deep learning approaches for regression prob-lems. Regression problems are crucial in the context of human-robot interaction in order to obtain reliable information about head and body poses or theage of the persons facing the robot. Consequently, these contributions are really general and can be applied in many different contexts. First, we proposeto couple a Gaussian mixture of linear inverse regressions with a convolutionalneural network. Second, we introduce a Gaussian-uniform mixture model inorder to make the training algorithm more robust to noisy annotations. Finally,we perform a large-scale study to measure the impact of several architecturechoices and extract practical recommendations when using deep learning approaches in regression tasks. For each of these contributions, a strong experimental validation has been performed with real-time experiments on the NAOrobot or on large and diverse data-sets.
Document type :
Complete list of metadatas

Cited literature [216 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Monday, October 29, 2018 - 1:39:06 PM
Last modification on : Wednesday, November 4, 2020 - 3:23:18 PM
Long-term archiving on: : Wednesday, January 30, 2019 - 3:21:13 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01801807, version 2



Stéphane Lathuiliere. Deep Regression Models and Computer Vision Applications for Multiperson Human-Robot Interaction. Human-Computer Interaction [cs.HC]. Université Grenoble Alpes, 2018. English. ⟨NNT : 2018GREAM026⟩. ⟨tel-01801807v2⟩