Skip to Main content Skip to Navigation
Theses

Transformation automatique de la parole - Etude des transformations acoustiques

Larbi Mesbahi 1
1 CORDIAL - Human-machine spoken dialogue
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes, ENSSAT - École Nationale Supérieure des Sciences Appliquées et de Technologie
Abstract : The framework of this thesis is automatic voice conversion. The main purpose is to modify the signal of an utterance in order to mimic the voice of an other speaker. State of the art Voice Conversion Systems (VCS) often use Gaussian Mixture Models (GMM) to model source and target voices. Those VCS learn linear conversion functions on the GMMs and can reach fairly good converted voices. However, they are submitted to conception flows linked to the GMM learning stage. Among those flows are over-smoothing, which is an excess of generalization, and its opposite, the over-fitting, which is an excess in specialization. One purpose of this thesis is to explore alternate conversion functions and various mean to train them. The first followed idea is to reduce the number of free parameters of the conversion function. The second idea is to seek out an alternative to linear conversion functions with neural network based (RBF, Radial Basis Functions) conversion function. This thesis also focuses on the data used to train GMMs and conversion function. Indeed, in order to train the conversion function, speech bits coming from source and target speakers must be matched. But, in most use cases, the sentences uttered by the two speakers differ and, consequently, it is impossible to form parallel training corpora. Our proposal consists in matching vectors previously distributed in acoustic classes by a joint tree build on existing data. Lastly, the parametrization step is studied for it contributes to the quality of the converted voice. Indeed, a maximum of the speaker characterization must pass on the parametrized data. In this scope, we chose to use the True-Envelope characterization. But, as have shown previous studies, the dimensionality of this parametrization must be reduced for the data to be used as training material. To achieve that, Principal Component Analysis is used. This solution is even more efficient when used to derive phone-specific conversion functions.
Document type :
Theses
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00547088
Contributor : Equipe-Projet Cordial <>
Submitted on : Wednesday, December 15, 2010 - 3:40:35 PM
Last modification on : Thursday, January 7, 2021 - 4:24:26 PM
Long-term archiving on: : Monday, November 5, 2012 - 1:51:17 PM

Identifiers

  • HAL Id : tel-00547088, version 1

Citation

Larbi Mesbahi. Transformation automatique de la parole - Etude des transformations acoustiques. Interface homme-machine [cs.HC]. Université Rennes 1, 2010. Français. ⟨tel-00547088⟩

Share

Metrics

Record views

691

Files downloads

287