Conversion de voix pour la synthèse de la parole

Abstract : This thesis lies within the scope of the research tasks undertaken by division R&D of France Telecom in the text-to-speech synthesis field. More particularly, it relates to the field of voice conversion, a technology aiming at modifying a source speaker's speech so that it is perceived as another speaker had uttered it. The aim of this thesis is thus the diversification of synthesis voice via the design and the development of a high quality voice conversion system. The approaches studied in this thesis are based on GMM classification techniques and HNM modeling of speech signal. First, the influence of the spectral features coding on the GMM-based voice conversion performance is analyzed. Then, the dependence between the spectral envelope and the fundamental frequency is highlighted. Two voice conversion methods exploiting this dependence are proposed and then evaluated favorably compared to the existing state of the art. Problems related to the implementation of the voice conversion system are also tackled. The first problem is the high complexity of the voice conversion process compared to the synthesis process itself (the conversion task costs between 1.5 and 2 times more than the synthesis task itself). For that, a simplified GMM-based voice conversion procedure was presented, which enables reducing the conversion complexity by a factor between 45 and 130. The second problem relates to the learning of voice conversion function when the source and target training corpus are different. A method making possible the training of the transformation function using unspecified recordings was thus proposed.
Complete list of metadatas

Cited literature [63 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00009570
Contributor : Taoufik En-Najjary <>
Submitted on : Wednesday, June 22, 2005 - 4:26:09 PM
Last modification on : Sunday, November 10, 2019 - 1:14:27 AM
Long-term archiving on : Friday, September 14, 2012 - 1:25:25 PM

Identifiers

  • HAL Id : tel-00009570, version 1

Collections

Citation

Taoufik En-Najjary. Conversion de voix pour la synthèse de la parole. Traitement du signal et de l'image [eess.SP]. Université Rennes 1, 2005. Français. ⟨tel-00009570⟩

Share

Metrics

Record views

414

Files downloads

2288