Skip to Main content Skip to Navigation

Analyse et reconnaissance des émotions lors de conversations de centres d'appels

Abstract : Automatic emotion recognition in speech is a relatively recent research subject in the field of natural language processing considering that the subject has been proposed for the first time about ten years ago. This subject is nowadays the object of much attention, not only in academia but also in industry, thank to the increased models performance and system reliability. The first studies were based on acted data and non spontaneous speech. Up until now, most experiments carried out by the research community on emotions were realized pre-segmented sequences and with a unique speaker and not on spontaneous speech with several speaker. With this methodology the models built on acted data are hardly usable on data collected in natural context The studies we present in this thesis are based on call center’s conversation with about 1620 hours of dialogs and with at least two human speakers (a commercial agent and a client) for each conversation. Our aim is the detection, via emotional expression, of the client satisfaction.In the first part of this work we present the results we obtained from models using only acoustic or linguistic features for emotion detection. We show that to obtain correct results an approach taking into account only one of these features type is not enough. To overcome this problem we propose the combination of three type of features (acoustic, lexical and semantic). We show that the use of models with features fusion allows higher score for the recognition step in all case compared to the model using only acoustic features. This gain is also obtained if we use an approach without manual pre-processing (automatic segmentation of conversation, transcriptions based on automatic speech recognition).In the second part of our study we notice that even if models based on features combination are relevant for emotion detection the amount of data we use in our training set is too small if we used it on large amount of data test. To overcome this problem we propose a new method to automatically complete training set with new data. We base this selection on linguistic and acoustic criterion. These new information are issued from 100 hours of data. These additions allow us to double the amount of data in our training set and increase emotion recognition rate compare to the non-enrich models. Finally, in the last part we choose to evaluate our method on entire conversation and not only on conversations turns as in most studies. To define the classification of a dialog we use models built on the previous steps of this works and we add two new features group:i)structural features including information like the length of the conversation, the proportion of speech for each speaker in the dialogii)dialogic features including informations like the topic of a conversation and a new concept we call “affective implication”. The aim of the affective implication is to represent the impact of the current speaker’s emotional production on the other speakers. We show that if we combined all information we can obtain results close to those of humans
Document type :
Complete list of metadatas

Cited literature [204 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, November 29, 2012 - 10:03:14 AM
Last modification on : Monday, December 14, 2020 - 9:55:41 AM
Long-term archiving on: : Saturday, December 17, 2016 - 4:57:10 PM


Version validated by the jury (STAR)


  • HAL Id : tel-00758650, version 1



Christophe Vaudable. Analyse et reconnaissance des émotions lors de conversations de centres d'appels. Autre [cs.OH]. Université Paris Sud - Paris XI, 2012. Français. ⟨NNT : 2012PA112128⟩. ⟨tel-00758650⟩



Record views


Files downloads