Skip to Main content Skip to Navigation

Ancrages et modèles dynamiques de la prosodie : application à la reconnaissance des émotions actées et spontanées

Abstract : Recognition of emotional state of a speaker is an important step in making the humanmachine communication more natural and friendly. We study in this thesis the problem of emotion-oriented automatic speech processing (ASP) on both acted and natural data. The study of spontaneous emotions is conducted along with the ones having communication disorders which limit the development of the interaction's capabilities of a child. Techniques derived from emotion-oriented ASP must be based on robust parameters to describe the emotional correlates, and also face the constraints that are related to the change of speaker and semantic context. In this view, our work is based on the use of automated techniques to perform emotion recognition: we use many complementary anchors of speech (e.g., pseudophonemes) to extract different types of parameters from the signal (e.g., acoustic and prosodic), and also combine techniques to estimate their contributions in the recognition task. An effort has been done to focus on the development of new unconventional models of speech rhythm, since this component is not modeled clearly in the state-of-the-art emotion recognition systems. The experiments conducted in this thesis aim to demonstrate the relevance of using several anchor points of speech and their associated rhythmic patterns to identify the features that are correlated with emotions. The study of prototypical emotions has permitted to define a continuum which represents the emotional categories along with the emotional wheel of Plutchik. The analysis of communication disorders are carried out in close collaboration with clinicians and researchers teams in emotion-oriented ASP. This work aims to use automated methods (i.e., identification of speech anchor points and extraction of prosodic features) to characterize the features that are associated to a given language impairment (LI), e.g., autism, dysphasia and pervasive developmental disorders non-otherwise specified (PDD-NOS). A control group of typically developing children is also used to compare the prosodic abilities of the LI subjects. The results we obtained in this study are very promising because they contributed significantly to discriminate all of the LI subjects from the typically developing children, and also discriminate the different groups of LI in two distinct type of events: (i) imitation of intonation contours (constrained task) and (ii) production of spontaneous emotional speech (unconstrained task). In addition, the results provided by an automatic analysis of these data also allowed retrieving the diagnostic criteria defined by clinicians on the different groups of LI children. Current techniques in ASP can thus overcome the difficulties created by the study of spontaneous speech data produced by children voices. This opens the way for the difficult but so interesting task of how to make friendly and less "cold" communication systems that are currently available to us.
Complete list of metadata
Contributor : Theses Bupmc Connect in order to contact the contributor
Submitted on : Thursday, May 23, 2013 - 2:18:08 PM
Last modification on : Monday, July 18, 2022 - 10:52:35 AM
Long-term archiving on: : Saturday, August 24, 2013 - 6:20:18 AM


  • HAL Id : tel-00825312, version 1


Fabien Ringeval. Ancrages et modèles dynamiques de la prosodie : application à la reconnaissance des émotions actées et spontanées. Traitement du signal et de l'image [eess.SP]. Université Pierre et Marie Curie - Paris VI, 2011. Français. ⟨NNT : 2011PA066048⟩. ⟨tel-00825312⟩



Record views


Files downloads