Skip to Main content Skip to Navigation

Génération de parole expressive dans le cas des langues à tons

Abstract : Today, the human-computer interaction is reaching the naturalness and is increasingly similar to the human-human interaction, including the expressiveness (especially emotions and attitudes). In spoken communication, attitudes or social affects are mainly transferred through prosody. For tonal languages, prosody is also used to encode semantic information via tones. This thesis presents a study of social affects in Vietnamese, a tonal and under-resourced language, in order to apply the results to Vietnamese expressive speech synthesis task. The first task of this thesis concerns the construction of a first audio-visual corpus of Vietnamese attitudes which contains sixteen attitudes. This corpus is then used to study the audio-visual and intercultural perceptions of the Vietnamese attitudes. A series of perceptual tests was carried out with native and non-native listeners (French for non-native listeners). Experimental results reveal the fact that the influential factors on the perception of attitudes include the modality of presentation (audio, visual and audio-visual) and the attitudinal expression itself. These results also allow us to investigate the common specificities and cross-cultural specificities between Vietnamese and French attitudes. Another perception test was carried out using sentences with tonal variation to study the influence of Vietnamese tones on the perception of attitudes. The results show that non-native listeners can process the local prosodic cues of tones, together with the global cues of attitude patterns. After presenting our studies on Vietnamese social affects, we describe our work on attitude modelling to apply it to Vietnamese expressive speech synthesis. Based on the concept of prosodic contour superposition, a prosodic model was proposed to encode the attitudinal function of prosody for Vietnamese attitudes. This model was applied to generate the Vietnamese expressive speech and then evaluated in a perceptual experiment with synthetic utterances. The results validate the ability of applying our proposed model in generating the prosody of attitudes for a tonal language such as Vietnamese.
Document type :
Complete list of metadatas

Cited literature [223 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Friday, September 6, 2013 - 4:22:12 PM
Last modification on : Tuesday, November 24, 2020 - 4:20:03 PM
Long-term archiving on: : Saturday, December 7, 2013 - 4:21:02 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00859201, version 1



Dang Khoa Mac. Génération de parole expressive dans le cas des langues à tons. Autre. Université de Grenoble; Institut Polytechnique (Hanoï), 2012. Français. ⟨NNT : 2012GRENT016⟩. ⟨tel-00859201⟩



Record views


Files downloads