Skip to Main content Skip to Navigation

Détection de marqueurs affectifs et attentionnels de personnes âgées en interaction avec un robot

Abstract : This thesis work focuses on audio-visual detection of emotional (laugh and smile) and attentional markers for elderly people in social interaction with a robot. To effectively understand and model the pattern of behavior of very old people in the presence of a robot, relevant data are needed. I participated in the collection of a corpus of elderly people in particular for recording visual data. The system used to control the robot is a Wizard of Oz, several daily conversation scenarios were used to encourage people to interact with the robot. These scenarios were developed as part of the ROMEO2 project with the Approche association. We described at first the corpus collected which contains 27 subjects of 85 years' old on average for a total of 9 hours, annotations and we discussed the results obtained from the analysis of annotations and two questionnaires.My research then focuses on the attention detection and the laughter and smile detection. The motivations for the attention detection are to detect when the subject is not addressing to the robot and adjust the robot's behavior to the situation. After considering the difficulties related to the elderly people and the analytical results obtained by the study of the corpus annotations, we focus on the rotation of the head at the visual index and energy and quality vote for the detection of the speech recipient. The laughter and smile detection can be used to study on the profile of the speaker and her emotions. My interests focus on laughter and smile detection in the visual modality and the fusion of audio-visual information to improve the performance of the automatic system. Spontaneous expressions are different from posed or acted expression in both appearance and timing. Designing a system that works on realistic data of the elderly is even more difficult because of several difficulties to consider such as the lack data for training the statistical model, the influence of the facial texture and the smiling pattern for visual detection, the influence of voice quality for auditory detection, the variety of reaction time, the level of listening comprehension, loss of sight for elderly people, etc. The systems of head-turning detection, attention detection and laughter and smile detection are evaluated on ROMEO2 corpus and partially evaluated (visual detections) on standard corpus Pointing04 and GENKI-4K to compare with the scores of the methods on the state of the art. We also found a negative correlation between laughter and smile detection performance and the number of laughter and smile events for the visual detection system and the audio-visual system. This phenomenon can be explained by the fact that elderly people who are more interested in experimentation laugh more often and therefore perform more various poses. The variety of poses and the lack of corresponding data bring difficulties for the laughter and smile recognition for our statistical systems. The experiments show that the head-turning can be effectively used to detect the loss of the subject's attention in the interaction with the robot. For the attention detection, the potential of a cascade method using both methods in a complementary manner is shown. This method gives better results than the audio system. For the laughter and smile detection, under the same leave-one-out protocol, the fusion of the two monomodal systems significantly improves the performance of the system at the segmental evaluation.
Complete list of metadata

Cited literature [153 references]  Display  Hide  Download
Contributor : Abes Star :  Contact Connect in order to contact the contributor
Submitted on : Monday, February 29, 2016 - 4:32:07 PM
Last modification on : Thursday, October 14, 2021 - 9:18:42 AM
Long-term archiving on: : Sunday, November 13, 2016 - 6:36:28 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01280505, version 1


Fan Yang. Détection de marqueurs affectifs et attentionnels de personnes âgées en interaction avec un robot. Intelligence artificielle [cs.AI]. Université Paris Saclay (COmUE), 2015. Français. ⟨NNT : 2015SACLS081⟩. ⟨tel-01280505⟩



Record views


Files downloads