Skip to Main content Skip to Navigation

L'Art de la Voix : Caractériser l'information vocale dans un choix artistique

Abstract : To reach an international audience, audiovisual productions (films, TVshows, video games) must be translated into other languages. Generally, theoriginal voice is replaced by a new voice in the target language. This processis referred as dubbing. The voice casting process aimed at choosing avoice (an actor) in accordance with the original voice and the character, isperformed manually by an artistic director (AD). Today, ADs are looking fornew "talents" (less expensive and more available than experienced dubbers),but they cannot perform large-scale auditions. Automatic tools capable ofmeasuring the adequacy between a voice in a source language with a voicein a target language/culture and a given context is of great interest for audiovisualcompanies. In addition, beyond voice casting, this voice selectionproblematic echoes the major scientific questions of voice similarity andperception mechanism.In this work, we use the voices of professional actors selected by ADs indifferent languages from already dubbed works. First, we set up a protocolwith state-of-the-art methods in automatic speaker recognition to highlightthe existence of character/role specific information in our data. Wealso identify the influence of linguistic bias on the performance of the system.Then, we build methodological framework to evaluate the ability ofan automatic system to discriminate pairs of voices playing the same character.The system we created is based on Siamese Neural Networks. In thisevaluation protocol, we apply strong constraints to avoid possible biases(linguistic content, gender, etc.) and we learn a similarity measure that reflectsthe AD’s choices with a significant difference that is not attributed tochance. Finally, we train a new representational space highlighting the characterspecific information, called p-vector. Thanks to our methodologicalframework, we show that this representation allows to better discriminatethe voices of new characters, in comparison to a representation oriented onthe speaker information. In addition, we show that it is possible to benefitfrom the generalized knowledge of a model learned on a similar dataset using knowledge distillation in neural networks.This thesis gives a initial answer for assisted voice casting and providesautomatic tools capable of preselecting the relevant voices from a large setof voices in a target language. Despite the fact that the information characteristicof an artistic choice can be extracted from a large volume of data,even if this choice is difficult to formalize, we still have to highlight the explanatoryfactors of the decision of the system.We would like to explain, inaddition to the selection of voices, the reasons of this choice. Furthermore,understanding the decision process of the system would help us define the"voice palette". In future work, we would like to explore the influence of thetarget language and culture by extending our work to more languages. Inthe longer term, this work could help to understand how voice perceptionhas evolved since the beginning of dubbing.
Document type :
Complete list of metadata

Cited literature [308 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, September 16, 2020 - 9:39:08 AM
Last modification on : Friday, October 23, 2020 - 5:04:48 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02938152, version 2



Adrien Gresse. L'Art de la Voix : Caractériser l'information vocale dans un choix artistique. Traitement du signal et de l'image [eess.SP]. Université d'Avignon, 2020. Français. ⟨NNT : 2020AVIG0236⟩. ⟨tel-02938152v2⟩



Record views


Files downloads