. .. Cadre-méthodologique,

. .. Expériences,

.. .. Conclusion,

, Chapitre 7

. Le-p-vecteur, 2.1 Le p-vecteur : une représentation de l'information caractéristique du personnage, un espace de représentation du personnage Sommaire

, Homogénéisation de l'information par distillation, p.118

, Distillation de la connaissance

. .. Expériences,

. .. Analyse-des-résultats,

.. .. Conclusion,

, Système de recommandation automatique de voix

. .. , 23 2.3 Schématisation d'un système de vérification automatique du locuteur. La modélisation permet, entre autre, d'obtenir une représentation de taille fixe à partir de la séquence de paramètres de longueur variable

, Schématisation d'un GMM-UBM de 4 composantes et adaptation du modèle locuteur avec la procédure MAP, p.28

, L'apprentissage des Bottleneck Features est guidé par une tâche de discrimination

. .. Architecture-x-vecteur, , p.33

, Modèle tridimensionnel des émotions, p.39, 1984.

, Modèle circomplexe des émotions, p.40, 1980.

. .. , Modèle en lentille de Brunswik (Brunswik 1956), vol.46

. .. Vue-simplifiée-du-système-de-similarité, , p.72

, La Probabilistic Linear Discriminant Analysis (PLDA) permet d'estimer la similarité entre deux i-vecteurs au moyen d'un Likelihood Ratio (LR), Illustration du système de référence (A) et des deux variantes proposées dans notre approche (B et C)

, Nomenclature des fichiers de segments de voix, p.77

, Histogramme des durées des segments de voix du corpus Mass Effect 3. Ici, les segments d'une durée supérieure à 10 s (42 segments) ne sont pas représentés pour des raisons pratiques, p.78

, Les graphiques du haut illustrent les scores moyens des tests effectués sur les différents systèmes. Ceux du bas montrent leurs écarts-type respectifs, Distributions des scores moyens obtenus sur les différents systèmes en configuration FR ? FR

. .. , Méthode d'apprentissage de la matrice de projection avec neutralisation du biais linguistique. La Probabilistic Linear Discriminant Analysis (PLDA) permet d'estimer la similarité entre deux i-vecteurs au moyen d'un Likelihood Ratio (LR), p.86

, Scores moyens obtenus sur les systèmes B et C pour le test de la composante linguistique en configuration FR ? FR, vol.87

, Liste des figures 6.1 Réseaux de neurones siamois prenant deux représentations i-vecteurs

A. , B. , C. , and D. , La liste des personnages 1, 2, ..., 16 est auparavant mélangée. Les étiquettes (soldat, officier, extra-terrestre...) sont attribuées aux personnages selon notre propre interprétation des voix et ne font en aucun cas office de supervision

. .. , Représentation dans l'espace i-vecteur des personnages pour les cas A, B, C et D. Illustration obtenue avec t-SNE, vol.104

, Illustration en boîte à moustaches des distances mesurées entre les paires target (bleue) et nontarget (orange) dans le cas d'évaluation C. À gauche, les mesures faites sur le corpus de développement et à droite celles effectuées sur le corpus de test, p.108

, Occurrence des personnages (impliqués dans l'évaluation C) dans les différents quartiles calculés sur les erreurs de prédictions

. .. Architectures-utilisées-en-guise-de-comparaison, , p.111

, Illustration de l'approche disjointe utilisée pour l'apprentissage du p-vecteur

, Dans cette illustration, le modèle Maître apprend à discriminer les vecteurs donnés en entrée selon différentes classes, jusqu'à ce qu'il produise les soft-target requises pour l'apprentissage du modèle Élève. Les deux modèles peuvent être entraînés sur différents jeux de données

, Représentation dans l'espace des x-vecteurs des segments de voix des différents personnages

, Projection des p-vecteurs dans un espace à deux dimensions appris avec l'algorithme t-SNE. Les axes n'ont pas de signification particulière

. .. , Comparaison de la courbe d'apprentissage du modèle de similarité en fonction du pré-entraînement, p.132

, résultats obtenus avec les p-vecteurs de test sur la tâche d'appariement des voix. À gauche l'approche originale basée sur les i-vecteurs et à droite l'approche basée sur les xvecteurs (utilisés pour l'approche p-vecteur). Les losanges représentent les valeurs aberrantes

A. ;. , Exactitude des prédiction du système A en fonction de k selon différentes méthodes de comparaison, p.143

. .. , Exactitude des prédiction du système B en fonction de k selon différentes méthodes de comparaison, p.145

. .. , Exactitude des prédiction du système C en fonction de k selon différentes méthodes de comparaison, p.145

, Liste des tableaux

L. .. Bf, , p.47

, Taux de réussite des prédictions de la similarité des différents systèmes (k = 3). Tient compte des résultats cumulés sur les différents plis

, Taux de réussite des prédictions de la similarité des différents systèmes (k = 3) pour le test de la composante linguistique. Tient compte des résultats cumulés sur les différents plis, p.86

. .. , Structure des réseaux convolutifs utilisés dans le SNN suivant la nomenclature de Keras (Chollet et al. 2015), p.105

, Taux de réussite des prédiction du modèle d'appariement

, Valeurs de la statistique du test de Student pour la discrimination des paires target et nontarget

, Comparaison des performances obtenues avec les différentes architectures

. F-mesure, calculée sur l'analyse clustering effectuée à partir des p-vecteurs de test

, Mesure de la performance du classificateur de paires target et nontarget à partir des p-vecteurs de test. Les performances sur le corpus de développement (absentes du tableau) tournent généralement aux alentours de 85 % de réussite, p.131

A. .. Eer,

J. Abitbol, , p.2081289571, 2005.

Y. Adachi, S. Kawamoto, S. Morishima, and S. Nakamura, « Perceptual similarity measurement of speech by combination of acoustic features, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4861-4864, 2008.

Y. Adachi and S. Kawamoto, Shigeo Morishima et Satoshi Nakamura. « Perceptual similarity measurement of speech by combination of acoustic features, IEEE International Conference on Acoustics, Speech and Signal Processing

, IEEE, pp.4861-4864, 2008.

C. Anagnostopoulos, Features and classifiers for emotion recognition from speech : a survey from, Theodoros Iliou et Ioannis Giannoukos. «, vol.43, pp.155-177, 2000.

. Adamson and . Jenson, Sous la dir, de J Katzenberg. DreamWorks SKG, 2001.

T. Asami, R. Masumura, Y. Yamaguchi, H. Masataki, and Y. Aono, « Domain adaptation of dnn acoustic models using knowledge distillation, International Conference on Acoustics, Speech and Signal Processing

K. Amino, T. Sugawara, and T. Arai, « Speaker similarities in human perception and their spectral properties, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WESPAC). T. 9, 2006.

N. Anand and P. Verma, « Convoluted feelings convolutional and recurrent nets for detecting emotion from audio data, 2015.

G. Bhattacharya, M. Jahangir-alam, and P. Kenny, « Deep Speaker Embeddings for Short-Duration Speaker Verification, 18th Annual Conference of the International Speech Communication Association, pp.1517-1521, 2017.

C. Bateman, Game Writing : Narrative Skills for Videogames

, Applied English Series. Charles River Media, p.9781584504900, 2007.

M. Ben, F. Bimbot, and . «-d-map, A distance-normalized MAP estimation of speaker models for automatic speaker verification, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 2. IEEE, p.69, 2003.

O. Baumann and P. Belin, « Perceptual scaling of voice identity : common dimensions for different vowels and speakers, Psychological Research PRPF, vol.74, issue.1, p.110, 2010.

Y. Bengio, A. Courville, and P. Vincent, « Representation learning : A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.1798-1828, 2013.

P. Belin, J. Robert, P. Zatorre, P. Lafaille, B. Ahad et al., « Voice-selective areas in human auditory cortex, Nature, vol.403, p.309, 2000.

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning ». In : 26th Annual International Conference on Machine Learning, pp.41-48, 2009.

S. Diane and . Berry, « Vocal types and stereotypes : Joint effects of vocal attractiveness and vocal maturity on person perception, Journal of Nonverbal Behavior, vol.16, pp.41-54, 1992.

P. Belin, S. Fecteau, and C. Bedard, « Thinking the voice : Neural correlates of voice perception, Trends in Cognitive Sciences, vol.8, issue.3, pp.129-135, 2004.

F. Bimbot, J. Bonastre, C. Fredouille, G. Gravier, I. Magrin-chagnolleau et al., A tutorial on text-independent speaker verification, EURASIP Journal on Advances in Signal Processing, vol.4, p.101962, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01434501

. Ouvrages-de-référence,

M. Barnier and I. L. Corff, Cahiers de l'association française des enseignants et chercheurs en cinéma et audiovisuel, Mise au point, vol.5, 2013.

P. Bousquet, Driss Matrouf et Jean-François Bonastre. « Intersession compensation and scoring methods in the i-vectors space for speaker recognition, Twelfth Annual Conference of the International Speech Communication Association, 2011.

M. Babel, G. Mcguire, and J. King, « Towards a more nuanced view of vocal attractiveness, PloS one 9, vol.2, 2014.

M. Hasan-bahari, M. Mclaren, and D. Van-leeuwen, Age estimation from telephone speech using i-vectors, 13th Annual Conference of the International Speech Communication Association, 2012.

C. Busso, S. Shrikanth, and . Narayanan, « The expression and perception of emotions : Comparing assessments of self versus others, pp.257-260, 2008.

L. Boë, « Forensic voice identification in France, Speech Communication, vol.31, pp.79-84, 2000.

J. Bonastre, F. Bimbot, L. Boë, J. P. Campbell, and D. Et-ivan-magrin-chagnolleau, « Person authentication by voice : A need for caution, Eighth European Conference on Speech Communication and Technology, 2003.

J. Bonastre and J. Kahn, Solange Rossato et Moez Ajili. « Forensic speaker recognition : Mirages and reality, p.255, 2015.

B. Bonhomme, « Les stars et le cinéma d'animation ». In : Mise au point. Cahiers de l'association française des enseignants et chercheurs en cinéma et audiovisuel, 2014.

P. Bousquet and A. Larcher, Driss Matrouf, Jean-François Bonastre et Old?ich Plchot. « Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis, Odyssey Proceedings, 2012.

P. Bourdieu, Ce que parler veut dire : l'économie des échanges linguistiques. Fayard, p.2213012164, 1982.

J. Bromley, I. Guyon, and Y. Lecun, Eduard Säckinger et Roopak Shah. « Signature verification using a "siamese" time delay neural network », Advances in Neural Information Processing Systems, pp.737-744, 1994.

L. Bruckert, P. Bestelmeyer, M. Latinus, J. Rouger, I. Charest et al., « Vocal attractiveness increases by averaging, Current Biology, vol.20, pp.116-120, 2010.

E. Brunswik, Perception and the representative design of psychological experiments, 1956.

N. Burger, The Upside. Sous la dir. de T. Black, J Blumenthal et S. Tisch. The Weinstein Company, 2017.

C. Busso, M. Bulut, C. Lee, A. Kazemzadeh, E. Mower et al., « IEMOCAP : Interactive emotional dyadic motion capture database, Language resources and evaluation, vol.42, p.335, 2008.

P. Joseph, W. Campbell, . Shen, M. William, R. Campbell et al., Jean-François Bonastre et Driss Matrouf. « Forensic speaker recognition, 2009.

P. Joseph and . Campbell, « Speaker recognition : A tutorial, Proceedings of the IEEE, vol.85, pp.1437-1462, 1997.

S. Cornaz and D. C. Musique, voix chantée et apprentissage : une revue de littérature et quelques propositions d'exploitation en didactique de la phonétique des langues, 2014.

S. Chopra, R. Hadsell, and Y. Lecun, « Learning a similarity metric discriminatively, with application to face verification, IEEE Conference on Computer Vision and Pattern Recognition, pp.539-546, 2005.

F. Chollet, , 2015.

A. Joon-son-chung, A. Nagrani, and . Zisserman, Vox-Celeb2 : Deep Speaker Recognition ». In : 19th Annual Conference of the International Speech Communication Association, 2018.

P. Couprie, Le vocabulaire de l'objet sonore, Ouvrages de référence, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00807080

M. William, D. E. Campbell, . Sturim, A. Douglas, and . Reynolds, « Support vector machines using GMM supervectors for speaker verification, IEEE signal processing letters, vol.13, issue.5, pp.308-311, 2006.

N. Dehak, P. Dumouchel, and P. Kenny, « Modeling prosodic features with joint factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, pp.2095-2103, 2007.

B. Peter, P. Denes, E. Denes, and . Pinson, The speech chain, 1993.

N. Dehak, R. Dehak, P. Kenny, N. Brümmer, P. Ouellet et al., « Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, Tenth Annual Conference of the International Speech Communication Association, 2009.

N. Dehak, J. Patrick, R. Kenny, P. Dehak, P. Dumouchel et al., « Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, pp.788-798, 2010.

N. Dehak, J. Patrick, R. Kenny, P. Dehak, P. Dumouchel et al., « Front end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, pp.788-798, 2011.

N. Dehak, P. A. Torres-carrasquillo, D. Reynolds, and R. Dehak, « Language recognition via i-vectors and dimensionality reduction, Twelfth Annual Conference of the International Speech Communication Association, 2011.

G. Troy and . Deskins, « Stereotypes in video games and how they perpetuate prejudice, McNair Scholars Research Journal, vol.6, p.5, 2013.

G. Degottex and N. Obin, « Phase distortion statistics as a representation of the glottal source : Application to the classification of voice qualities, 14th Annual Conference of the International Speech Communication Association, 2013.

C. Darwin and P. Prodger, The expression of the emotions in man and animals, p.1872

E. Karen, . Dill, P. Kathryn, and . Thill, « Video game characters and the socialization of gender roles : Young people's perceptions mirror sexist media depictions, Sex roles, vol.57, pp.851-864, 2007.

. Bibliographie,

M. S. Moataz-el-ayadi, F. Kamel, and . Karray, « Survey on speech emotion recognition : Features, classification schemes, and databases, Pattern Recognition, vol.44, pp.572-587, 2011.

M. El-ayadi-;-mohamed, S. Kamel, and F. Karray, « Survey on speech emotion recognition : Features, classification schemes, and databases, Pattern Recognition, vol.44, pp.572-587, 2011.

P. Ekman, « Basic emotions, Handbook of cognition and emotion 98, p.16, 1999.

F. Hanna-s-feiser and . Kleber, « Voice similarity among brothers : evidence from a perception experiment, 21st Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), 2012.

L. Fu, X. Mao, and L. Chen, « Speaker independent emotion recognition based on SVM/HMMs fusion system, International Conference on Audio, Language and Image Processing (ICASSP). IEEE, pp.61-65, 2008.

R. J. Johnny, K. R. Fontaine, E. B. Scherer, . Roesch, C. Phoebe et al., « The world of emotions is not two-dimensional, Psychological science, vol.18, pp.1050-1057, 2007.

L. Ferrer, N. Scheffer, and E. Shriberg, « A comparison of approaches for modeling prosodic features in speaker recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2010, pp.4414-4417

S. Furui, « Cepstral analysis technique for automatic speaker verification, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.29, pp.254-272, 1981.

Y. Gambier, « La traduction audiovisuelle : un genre en expansion, Meta : journal des traducteurs/Meta : Translators' Journal, vol.49, pp.1-11, 2004.

X. Glorot and Y. Bengio, Thirteenth International Conference on Artificial Intelligence and Statistics. Sous la dir. d'Yee Whye Teh et Mike Titterington. T. 9. PMLR, pp.249-256, 2010.

. Ouvrages-de-référence,

. Dayana-ribas-gonzalez, R. José, and L. Calvo-de, « Speaker verification with shifted delta cepstral features : Its Pseudo-Prosodic Behaviour, First Iberian SLTech, 2009.

D. Garcia-romero, Y. Carol, and . Espy-wilson, « Analysis of i-vector length normalization in speaker recognition systems, Twelfth Annual Conference of the International Speech Communication Association, 2011.

R. Giesen and A. Khan, Acting and Character Animation : The Art of Animated Films, Acting and Visualizing, p.1498778631, 2017.

J. Gauvain and C. Lee, « Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE transactions on speech and audio processing, vol.2, pp.291-298, 1994.

. B-golse, Les précurseurs corporels et comportementaux du langage verbal, 2005.

H. Gunes and B. Schuller, Maja Pantic et Roddy Cowie. « Emotion representation, analysis and synthesis in continuous space : A survey, Face and Gesture, pp.827-834, 2011.

J. Haton, C. Cerisara, D. Fohr, Y. Laprie, and K. Smaïli, Reconnaissance automatique de la parole : Du Signal à son Interprétation. Dunod, 2006.

R. Hadsell, S. Chopra, and Y. Lecun, « Dimensionality reduction by learning an invariant mapping, Computer vision and pattern recognition, pp.1735-1742, 2006.

A. Hassan, I. Robert, and . Damper, « Multi-class and hierarchical SVMs for emotion recognition, 11th Annual Conference of the International Speech Communication Association, 2010.

S. M. Hughes, F. Dispenza, and G. G. Gallup, « Ratings of voice attractiveness predict sexual behavior and body configuration, Evolution and Human Behavior, vol.25, pp.295-304, 2004.

N. Henrich, « Etude de la source glottique en voix parlée et chantée : modélisation et estimation, mesures acoustiques et électroglottographiques, perception, vol.6, 2001.

K. Hevner, « Experimental studies of the elements of expression in music, American journal of Psychology, vol.48, pp.246-268, 1936.

. Carolyn-r-hodges-simeon, J. C. Steven, . Gaulin, A. David, and . Puts,

, « Voice correlates of mating success in men : examining "contests" versus "mate choice" modes of sexual selection, Archives of Sexual Behavior, vol.40, pp.551-557, 2011.

H. John, T. Hansen, and . Hasan, « Speaker recognition by machines and humans : A tutorial review, IEEE Signal processing magazine, vol.32, pp.74-99, 2015.

O. Andrew and . Hatch, Sachin Kajarekar et Andreas Stolcke. « Withinclass Covariance Normalization for SVM-based Speaker Recognition, Proceedings of ICSLP, pp.1471-1474, 2006.

A. Hill, ;. V. David-a-puts, . Weekes-shackelford, V. A. Shackelford, and . Weekes-shackelford, Encyclopedia of Evolutionary Psychological Science, pp.1-5, 2016.

K. Huang, C. Wu, Q. Hong, M. Su, and Y. Chen, « Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5866-5870, 2019.

G. Hinton, Oriol Vinyals et Jeffrey Dean. « Distilling the Knowledge in a Neural Network, 2014.

Y. Ijima and H. Mizuno, Similar speaker selection technique based on distance metric learning using highly correlated acoustic features with perceptual voice quality similarity, IEICE TRANSACTIONS on Information and Systems, vol.98, pp.157-165, 2015.

J. Jansz, G. Raynel, and . Martis, The Lara phenomenon : Powerful female characters in video games, Sex roles, vol.56, pp.141-148, 2007.

M. Neethu, S. Joy, and . Reddy-kothinti, Srinivasan Umesh et Basil Abraham. « Generalized distillation framework for speaker normalization, 18th Annual Conference of the International Speech Communication Association, 2017.

T. Johnstone, R. Klaus, and . Scherer, « The effects of emotions on voice quality, 14th International Congress of Phonetic Sciences, pp.2029-2032, 1999.

. Ouvrages-de-référence,

J. Kahn, N. Audibert, S. Rossato, and J. Bonastre, « Speaker verification by inexperienced and experienced listeners vs. speaker verification system, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5912-5915, 2011.

E. Neil-t-kleynhans and . Barnard, « Language dependence in multilingual speaker verification, Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa. PRASA, 2005.

P. Kenny and P. Dumouchel, « Disentangling speaker and channel effects in speaker verification, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.37-40, 2004.

F. Kelly, A. Alexander, O. Forth, and S. Kent, Jonas Lindh et Joel Åkesson. « Identifying perceptually similar voices with a speaker recognition system using auto-phonetic features, 17th Annual Conference of the International Speech Communication Association, pp.1567-1568, 2016.

P. Kenny, T. Stafylakis, and P. Ouellet, Vishwa Gupta et Md Jahangir Alam. « Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition, Odyssey Proceedings, pp.293-298, 2014.

P. Kenny, « Joint factor analysis of speaker and session variability : Theory and algorithms-technical report, Montreal, CRIM 2005, 2005.

P. Kenny, « Bayesian speaker verification with heavy-tailed priors, Odyssey Proceedings, 2010.

L. George-kersta, Voiceprint identification, vol.34, pp.725-725, 1962.

M. Robert and . Krauss, Robin Freyberg et Ezequiel Morsella. « Inferring speakers' physical attributes from their voices, Journal of Experimental Social Psychology, vol.38, pp.510-513, 2002.

M. Andreas-m-kaplan and . Haenlein, « Users of the world, unite ! The challenges and opportunities of Social Media, Business horizons, vol.53, pp.59-68, 2010.

T. Kinnunen, C. Wei, E. Koh, and L. Wang, Haizhou Li et Eng Siong Chng. « Temporal discrete cosine transform : Towards longer term temporal features for speaker verification, Fifth International Symposium on Chinese Spoken Language Processing, pp.547-558, 2006.

. Bibliographie,

H. Kido and . Kasuya, « Everyday expressions associated with voice quality of normal utterance-Extraction by perceptual evaluation, Acoustic Society of Japan, vol.57, pp.337-344, 2001.

P. Kenny, M. Mihoubi, and P. Dumouchel, « New MAP estimators for speaker recognition, Eighth European Conference on Speech Communication and Technology, 2003.

V. Kempe, A. David, . Puts, A. Rodrigo, and . Cárdenas, Masculine men articulate less clearly, Human Nature, vol.24, pp.461-475, 2013.

. Shashidhar-g-koolagudi and . Sreenivasa-rao, « Emotion recognition from speech : a review, International journal of speech technology, vol.15, pp.99-117, 2012.

. Jody-e-kreiman, « Reconsidering the nature of voice, The Journal of the Acoustical Society of America, vol.144, pp.1765-1765, 2018.

B. Kulis, Metric learning : A survey, pp.287-364, 2013.

G. Koch, R. Zemel, and R. Salakhutdinov, « Siamese neural networks for one-shot image recognition ». Mém. de mast, 2015.

J. Laver, The phonetic description of voice quality : Cambridge Studies in Linguistics, p.521231760, 1980.

M. Latinus and P. Belin, « Human voice perception, Current Biology, vol.21, pp.143-145, 2011.

. David-le-breton, Une anthropologie des voix. Métailié, p.2864248425, 2011.

J. Lindh and A. Eriksson, « Voice similarity-a comparison between judgements by human listeners and automatic voice comparison, Proceedings from FONETIK, pp.63-69, 2010.

M. Adrien and . Legendre, Nouvelles méthodes pour la détermination des orbites des comètes, F. Didot, p.1805

A. M. Legendre, Nouvelles méthodes pour la détermination des orbites des comètes, F. Didot, p.1805

Y. Lei, N. Scheffer, L. Ferrer, and M. Mclaren, « A novel scheme for speaker recognition using a phoneticallyaware deep neural network », 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1695-1699, 2014.

. Ouvrages-de-référence,

C. Li, X. Ma, B. Jiang, X. Li, X. Zhang et al., Ajay Kannan et Zhenyao Zhu. « Deep Speaker : an End-to-End Neural Speaker Embedding System, 2017.

J. Li, L. Michael, X. Seltzer, and . Wang, Rui Zhao et Yifan Gong. « Large-scale domain adaptation via teacher-student learning, 2017.

M. Lugger, M. Et-bin, and Y. , « Combining classifiers with diverse feature sets for robust speaker independent emotion recognition, 17th European Signal Processing Conference, pp.1225-1229, 2009.

I. Luengo, E. Navas, and I. Hernáez, « Feature analysis and evaluation for automatic emotion identification in speech, IEEE Transactions on Multimedia, vol.12, pp.490-501, 2010.

D. Loakes, « A forensic phonetic investigation into the speech patterns of identical and non-identical twins, International Journal of Speech, Language and the Law, vol.15, pp.97-100, 2008.

D. Lopez-paz and L. Bottou, Bernhard Schölkopf et Vladimir Vapnik. « Unifying distillation and privileged information, International Conference on Learning Representations, 2016.

D. Martinez and L. Burget, Luciana Ferrer et Nicolas Scheffer. « iVector-based prosodic system for language identification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4861-4864, 2012.

D. Matrouf and . Jean-françois, Bonastre et Salah Eddine Mezaache. « Factor analysis multi-session training constraint in session compensation for speaker verification, Ninth Annual Conference of the International Speech Communication Association, 2008.

S. Mirsamadi, E. Barsoum, and C. Zhang, Automatic speech emotion recognition using recurrent neural networks with local attention, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2227-2231

. Robert-r-mccrae, The Five-Factor Model of personality traits : consensus and controversy. Sous la dir, pp.148-161, 2009.

. Bibliographie,

K. Mcdougall, « Assessing perceived voice similarity using Multidimensional Scaling for the construction of voice parades, International Journal of Speech, vol.20, issue.2, 2013.

K. Mcdougall, « Listeners' perception of voice similarity in Standard Southern British English versus York English, 23rd Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), 2014.

G. Matthews, J. Ian, . Deary, C. Martha, and . Whiteman, , 2009.

M. Mclaren, Y. Lei, and L. Ferrer, 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp.4814-4818, 2015.

T. Mikolov, V. Quoc, I. Le, and . Sutskever, « Exploiting similarities among languages for machine translation, 2013.

K. Markov and T. Matsui, « Robust Speech Recognition Using Generalized Distillation Framework, 17th Annual Conference of the International Speech Communication Association, 2016.

J. Markel, . Oshika, and . Gray, « Long-term feature averaging for speaker recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.25, pp.330-337, 1977.

G. Mohammadi, A. Origlia, M. Filippone, and A. Vinciarelli, « From speech to personality : Mapping voice quality and intonation into personality differences, 20th ACM International Conference on Multimedia, pp.789-792, 2012.

S. Morishima, A. Maejima, S. Wemler, T. Machida, and M. Takebayashi, ACM SIGGRAPH 2005 Sketches, p.20, 2005.

J. R-mccrae-cochrane and . Sachs, « Phonological learning by children and adults in a laboratory setting, Language and Speech, vol.22, pp.145-149, 1979.

G. Mohammadi and A. Vinciarelli, Automatic personality perception : Prediction of trait attribution based on prosodic features extended abstract, pp.273-284

G. Mohammadi and A. Vinciarelli, Automatic attribution of personality traits based on prosodic features, 2012.

G. Mohammadi and A. Vinciarelli, Automatic Attribution of Personality Traits Based on Prosodic Features, ACII 2015 Affective Computing and Intelligent Interaction, vol.3, pp.29-32, 2015.

G. Mohammadi, A. Vinciarelli, and M. Mortillaro, « The voice of personality : Mapping nonverbal vocal behavior into trait attributions, Proceedings of the 2nd International Workshop on Social Signal Processing, pp.17-20, 2010.

D. Neiberg, Kjell Elenius et Kornel Laskowski. « Emotion recognition in spontaneous speech using GMMs, Ninth International Conference on Spoken Language Processing, 2006.

S. Tin-lay-nwe, L. Wei-foo, and . Silva, « Speech emotion recognition using hidden Markov models, Speech communication, vol.41, pp.99-101, 2003.

C. Nass-et-kwan-min-lee, « Does computer-synthesized speech manifest personality ? Experimental tests of recognition, similarity-attraction, and consistency-attraction, Journal of experimental psychology : applied, vol.7, p.171, 2001.

F. Nolan, K. Mcdougall, and T. Hudson, « Some Acoustic Correlates of Perceived (Dis) Similarity between Sameaccent Voices, ICPhS, pp.1506-1509, 2011.

F. Nolan and T. Oh, « Identical twins, different voices, International Journal of Speech, Language and the Law, vol.3, pp.39-49, 1996.

F. Nolan and K. Mcdougall, Gea De Jong et Toby Hudson. « The DyViS database : style-controlled recordings of 100 homogeneous speakers for forensic phonetic research, International Journal of Speech, vol.16, issue.1, 2009.

F. Nolan, P. French, and K. Mcdougall, Louisa Stevens et Toby Hudson. « The role of voice quality 'settings' in perceived voice similarity, International Association for Forensic Phonetics and Acoustics, 2011.

N. Obin, « Similarity search of acted voices for automatic voice casting, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, pp.1642-1651, 2016.

. Bibliographie,

N. Obin, A. Roebel, and G. Bachman, « On automatic voice casting for expressive speech : Speaker recognition vs. speech classification, pp.950-954, 2014.

N. Obin, A. Roebel, and G. Bachman, « On automatic voice casting for expressive speech : Speaker recognition vs. speech classification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.950-954, 2014.

S. Oyama, « A sensitive period for the acquisition of a nonnative phonological system, Journal of psycholinguistic research, vol.5, pp.261-283, 1976.

D. Aniruddh, . Patel, and . Joseph-r-daniele, « An empirical comparison of rhythm in language and music, Cognition, vol.87, issue.02, pp.187-194, 2003.

J. Simon, . Prince, H. James, and . Elder, « Probabilistic linear discriminant analysis for inferences about identity, 2007 IEEE 11th International Conference on Computer Vision. IEEE, pp.1-8, 2007.

W. Rosalind and . Picard, Affective computing, 2000.

K. Pisanski, J. Paul, C. C. Fraccaro, . Tigue, J. M. Jillian et al., « Vocal indicators of body size in men and women : a meta-analysis, Animal Behaviour, vol.95, pp.89-99, 2014.

R. Price, K. Iso, and K. Shinoda, « Wise teachers train better DNN acoustic models, Speech, and Music Processing, 2016.

G. Papcun, J. Kreiman, and A. Davis, « Long-term memory for unfamiliar voices, The Journal of the Acoustical Society of America, vol.85, pp.913-925, 1989.

G. Planchenault, « La commodification des voix au cinéma : un outil de différentiation et de stigmatisation langagière, Entrelacs. Cinéma et audiovisuel, p.11, 2014.

R. Plutchik, Emotions : A General Psychoevolutionary Theory, pp.197-219, 1984.

. Ouvrages-de-référence,

D. Povey, A. Ghoshal, and G. Boulianne, Lukas Burget et Ondrej Glembek. « The Kaldi speech recognition toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding, 2011.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., Georg Stemmer et Karel Vesely. « The Kaldi Speech Recognition Toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.

P. Petta, C. Pelachaud, and R. Cowie, Emotionoriented systems : the HUMAINE handbook, pp.978-981, 2011.

J. L. David-a-puts, . Barndt, L. M. Lisa, K. Welling, . Dawood et al., « Intrasexual competition among women : Vocal femininity affects perceptions of attractiveness and flirtatiousness, Personality and Individual Differences, vol.50, pp.111-115, 2011.

D. Aniruddh, Y. Patel, B. Xu, and . Wang, « The role of F0 variation in the intelligibility of Mandarin sentences, Speech Prosody Fifth International Conference, 2010.

J. Révis, La voix et soi : Ce que notre voix dit de nous, p.2353272312, 2013.

A. Douglas and . Reynolds, Comparison of background normalization methods for text-independent speaker verification, Fifth European Conference on Speech Communication and Technology, 1997.

J. M. Robert-e-remez, . Fellowes, S. Dalia, and . Nagel, On the perception of similarity among talkers, The Journal of the Acoustical Society of America, vol.122, pp.3688-3696, 2007.

R. Riad, C. Dancette, and J. Karadayi, Neil Zeghidour, Thomas Schatz et Emmanuel Dupoux. « Sampling strategies in Siamese Networks for unsupervised speech representation learning, 19th Annual Conference of the International Speech Communication Association, 2018.

A. Rilliard, T. Shochi, and J. Martin, Donna Erickson et Véronique Aubergé. « Multimodal indices to Japanese and French prosodically expressed social affects, Language and speech, vol.52, pp.223-243, 2009.

A. Rilliard, J. Antônio-de-moraes, D. Erickson, and T. Shochi, « Social affect production and perception across languages and cultures-the role of prosody, Leitura, vol.2, pp.15-41, 2013.

. Bibliographie,

A. Rilliard, D. Erickson, T. João-a-de-moraes, and . Shochi, « On the varying reception of speakers expressivity across gender and cultures, and inference in their personalities, Sonorities : speech, singing and reciting expressivity, pp.149-163, 2016.

M. Raento, A. Oulasvirta, and N. Eagle, « Smartphones : An emerging tool for social scientists, Sociological methods & research, vol.37, pp.426-454, 2009.

P. Rose, « Differences and distinguishability in the acoustic characteristics of Hello in voices of similar-sounding speakers, Australian Review of Applied Linguistics, vol.22, pp.1-42, 1999.

M. Rouvier, P. Bousquet, and M. Ajili, Waad Ben Kheder, Driss Matrouf et Jean-François Bonastre. « LIA system description for NIST SRE, 2016.

A. Douglas, . Reynolds, F. Thomas, . Quatieri, B. Robert et al.,

, « Speaker verification using adapted Gaussian mixture models, Digital signal processing, vol.10, pp.19-41, 2000.

A. Douglas, . Reynolds, C. Richard, and . Rose, « Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing, vol.3, issue.1, pp.72-83, 1995.

A. James and . Russell, A circumplex model of affect, Journal of personality and social psychology, vol.39, p.1161, 1980.

N. Sáenz-lechón, J. I. Godino-llorente, V. Osma-ruiz, M. Blanco-velasco, and F. Cruz-roldán, Automatic assessment of voice quality according to the GRBAS scale, International Conference of the IEEE Engineering in Medicine and Biology Society, pp.2478-2481, 2006.

E. S. Segundo, P. Foulkes, P. French, P. Harrison, V. Hughes et al., « Cluster analysis of voice quality ratings : Identifying groups of perceptually similar speakers, pp.173-176, 2018.

M. Sarma, P. Ghahremani, D. Povey, . Nagendra-kumar, and . Goel, Kandarpa Kumar Sarma et Najim Dehak. « Emotion Identification from Raw Speech Signals Using DNNs, 19th Annual Conference of the International Speech Communication Association, pp.3097-3101, 2018.

. Ouvrages-de-référence,

A. Disa, F. Sauter, P. Eisner, . Ekman, K. Sophie et al., « Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations, Proceedings of the National Academy of Sciences, vol.107, pp.2408-2412, 2010.

A. Kumar-sarkar, J. Bonastre, and D. Matrouf, « A study on the roles of total variability space and session variability modeling in speaker recognition, International Journal of Speech Technology, vol.19, pp.111-120, 2016.

R. Klaus, R. Scherer, . Banse, G. Harald, and . Wallbott, « Emotion inferences from vocal expression correlate across languages and cultures, Journal of Cross-cultural psychology, vol.32, pp.76-92, 2001.

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, and L. Devillers, Christian Müller et Shrikanth S Narayanan. « The interspeech 2010 paralinguistic challenge, Eleven Annual Conference of the International Speech Communication Association, 2010.

N. Scheffer, L. Ferrer, M. Graciarena, and S. Kajarekar, Elizabeth Shriberg et Andreas Stolcke. « The SRI NIST 2010 speaker recognition evaluation system, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.5292-5295, 2011.

B. Schuller, A. Batliner, S. Steidl, and D. Seppi, « Recognising realistic emotions and affect in speech : State of the art and lessons learnt from the first challenge, Speech Communication, vol.53, pp.1062-1087, 2011.

B. Schuller, S. Steidl, and A. Batliner, Florian Schiel et Jarek Krajewski. « The insterspeech 2011 speaker state challenge, Twelve Annual Conference of the International Speech Communication Association, 2011.

B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli et al., « The interspeech 2012 speaker trait challenge, Thirteenth Annual Conference of the International Speech Communication Association, 2012.

S. Scherer and J. Kane, Christer Gobl et Friedhelm Schwenker. « Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification, Computer Speech & Language, vol.27, pp.263-287, 2013.

. Bibliographie,

H. Stefan-r-schweinberger, A. P. Kawahara, . Simpson, G. Verena, R. Skuk et al., Speaker perception, vol.5, pp.15-25, 2014.

R. Klaus, R. Scherer, . Banse, G. Harald, T. Wallbott et al., « Vocal cues in emotion encoding and decoding, Motivation and emotion, vol.15, pp.123-148, 1991.

R. Klaus and . Scherer, « Vocal communication of emotion : A review of research paradigms, Speech Communication, vol.40, pp.84-89, 2003.

R. Klaus and . Scherer, « Vocal communication of emotion : A review of research paradigms, Speech Communication, vol.40, pp.84-89, 2003.

R. Klaus and . Scherer, « The dynamic architecture of emotion : Evidence for the component process model, Cognition and emotion, vol.23, pp.1307-1351, 2009.

R. Klaus and . Scherer, « Voice quality analysis of American and German speakers, Journal of Psycholinguistic Research, vol.3, issue.3, pp.281-298, 1974.

R. Klaus and . Scherer, « Personality inference from voice quality : The loud voice of extroversion, European Journal of Social Psychology, vol.8, pp.467-487, 1978.

R. Klaus and . Scherer, Cahiers De Psychologie Cognitive/Current Psychology Of Cognition, 1984.

A. Sell, A. Gregory, L. Bryant, J. Cosmides, D. Tooby et al., « Adaptations in humans for assessing physical strength from the voice, Proceedings of the Royal Society B : Biological Sciences, vol.277, pp.3509-3518, 2010.

U. Schimmack and A. Grob, « Dimensional models of core affect : A quantitative comparison by means of structural equation modeling, European Journal of Personality, vol.14, pp.325-345, 2000.

E. Shouse and . Feeling, M/c journal, vol.8, p.26, 2005.

A. Stefan-r-schweinberger, W. Herholz, and . Sommer, « Recognizing famous voices : Influence of stimulus duration and different types of retrieval cues, Journal of Speech, Language, and Hearing Research, vol.40, pp.453-463, 1997.

E. Mompean, « A simplified vocal profile analysis protocol for the assessment of voice quality and speaker similarity, Journal of Voice, vol.31, pp.644-655, 2017.

D. Snyder, P. Ghahremani, D. Povey, and D. Garcia-romero, Yishay Carmiel et Sanjeev Khudanpur. « Deep neural network-based speaker embeddings for end-to-end speaker verification, 2016 IEEE Spoken Language Technology Workshop (SLT), pp.165-170, 2016.

D. Snyder, D. Garcia-romero, D. Povey, and S. Khudanpur, « Deep Neural Network Embeddings for Text-Independent Speaker Verification, 18th Annual Conference of the International Speech Communication Association, 2017.

D. Snyder, D. Garcia-romero, and G. Sell, Daniel Povey et Sanjeev Khudanpur. « X-vectors : Robust dnn embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing

, IEEE, pp.5329-5333, 2018.

K. Frank, A. E. Soong, B. Rosenberg, . Juang, and . Lawrence-r-rabiner, Report : A vector quantization approach to speaker recognition, AT&T technical journal, vol.66, pp.14-26, 1987.

N. Srivastava, G. Hinton, and A. Krizhevsky, Ilya Sutskever et Ruslan Salakhutdinov. « Dropout : A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

E. Shriberg and A. Stolcke, « The case for automatic higher-level features in forensic speaker recognition, Ninth Annual Conference of the International Speech Communication Association, 2008.

F. Schröter and J. Thon, « Video game characters. Theory and analysis, Diegesis 3.1, 2014.

. Bibliographie,

M. Su, C. Wu, K. Huang, Q. Hong, and H. Wang, « Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks », 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1532-1536

. Stanton and . Unkrich, Le Monde de Némo. Sous la dir. de G Walters, 2003.

T. Seehapoch and S. Wongthanavasu, « Speech emotion recognition using support vector machine, 5th International Conference on Knowledge and Smart Technology (KST)

, IEEE, pp.86-91, 2013.

M. Tahon, « Analyse acoustique de la voix émotionnelle de locuteurs lors d'une interaction humain-robot, 2012.

M. Teshigawara, « Voices in Japanese Animation : How People Perceive the Voices of Good Guys and Bad Guys, Working Papers of the Linguistics Circle, vol.17, pp.149-158, 2003.

M. Teshigawara, « Random splicing : A method of investigating the effects of voice quality on impression formation, Speech Prosody 2004, International Conference, 2004.

B. Teston, L'évaluation instrumentale des dysphonies. Etat actuel et perspectives, 2004.

P. Tim, S. Katrin, M. Sebastian, and M. Florian, Gelareh Mohammadi et Alessandro Vinciarelli. « On speaker-independent personality perception and prediction from speech, 13th Annual Conference of the International Speech Communication Association, 2012.

E. Toledano, . Nakache, and . Intouchable, Sous la dir. de N Duval Adassovsky, Y Zenou et L Zeitoun. TF1 Film Production, 2011.

G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, A. Mihalis et al., Bjorn Schuller et Stefanos Zafeiriou. « Adieu features ? End-to-end speech emotion recognition using a deep convolutional recurrent network, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). T, pp.5200-5204, 2016.

P. Khiet, D. A. Truong, F. M. Van-leeuwen, and G. Jong, « Speech-based recognition of self-reported and observed emotion in a dimensional space, Speech Communication, vol.54, pp.1049-1063, 2012.

. Ouvrages-de-référence,

P. Tzirakis, J. Zhang, W. Bjorn, and . Schuller, « Endto-end speech emotion recognition using deep neural networks », IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5089-5093, 2018.

E. Variani, X. Lei, E. Mcdermott, I. L. Moreno, and J. Gonzalez-dominguez, « Deep neural networks for small footprint text-dependent speaker verification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4052-4056, 2014.

V. Vapnik and R. Izmailov, « Learning Using Privileged Information : Similarity Control and Knowledge Transfer, vol.16, pp.2023-2049, 2015.

D. Van-lancker, J. Kreiman, and K. Emmorey, « Familiar voice recognition : Patterns and parameters : I. Recognition of backward voices, Journal of phonetics, 1985.

A. Vinciarelli and G. Mohammadi, « A survey of personality computing, IEEE Transactions on Affective Computing, vol.5, pp.273-291, 2014.

J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang et al., « Learning fine-grained image similarity with deep ranking, IEEE Conference on Computer Vision and Pattern Recognition, pp.1386-1393, 2014.

S. Watanabe, T. Hori, J. L. Roux, R. John, and . Hershey, « Student-teacher network learning with enhanced features, International Conference on Acoustics, Speech and Signal Processing

. John-r-wheatley, A. Coren, . Apicella, P. Robert, R. A. Burriss et al., « Women's faces and voices are cues to reproductive potential in industrial and forager societies, Evolution and Human Behavior, vol.35, pp.264-271, 2014.

J. Jared and . Wolf, « Efficient acoustic parameters for speaker recognition, The Journal of the Acoustical Society of America, vol.51, pp.2044-2056, 1972.

C. Wedge, . Saldanha, and . L'âge-de-glace, Sous la dir. de L Forte, 2002.

R. Xia and Y. Liu, « Using i-vector space model for emotion recognition, Thirteenth Annual Conference of the International Speech Communication Association, 2012.

S. Yaman, Jason Pelecanos et Ruhi Sarikaya. « Bottleneck features for speaker recognition, Odyssey Proceedings, 2012.

T. Yamada, L. Wang, and A. Kai, « Improvement of distant-talking speaker identification using bottleneck features of DNN, 14th Annual Conference of the International Speech Communication Association, pp.3661-3664, 2013.

E. Zetterholm, Mats Blomberg et Daniel Elenius. « A comparison between human perception and a speaker verification system score of a voice imitation, Tenth Australian International Conference on Speech Science & Technology, pp.393-397, 2004.

M. Zuckerman, E. Robert, and . Driver, « What sounds beautiful is good : The vocal attractiveness stereotype, Journal of Nonverbal Behavior, vol.13, pp.67-82, 1989.

M. Zuckerman and R. E. Driver, « What sounds beautiful is good : The vocal attractiveness stereotype, Journal of Nonverbal Behavior, vol.13, pp.67-82, 1989.

N. Zeghidour and G. Synnaeve, Nicolas Usunier et Emmanuel Dupoux. « Joint learning of speaker and phonetic similarities with siamese networks, 17th Annual Conference of the International Speech Communication Association, pp.1295-1299, 2016.

N. Zeghidour and G. Synnaeve, Maarten Versteegh et Emmanuel Dupoux. « A deep scattering spectrum-Deep Siamese network pipeline for unsupervised acoustic modeling, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.4965-4969, 2016.

D. Matthew and . Zeiler, « ADADELTA : an adaptive learning rate method, 2012.

. Miron-zuckerman-et-miyake-kunitate, The Attractive Voice : What Makes It So ? » In : Health, vol.17, 1993.

C. Zhang and T. Tan, « Voice disguise and automatic speaker recognition, Forensic science international, vol.175, pp.118-122, 2008.

C. Zhang, C. Yu, H. L. John, and . Hansen, « An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, vol.11, pp.684-694, 2017.

A. Gresse, M. Rouvier, R. Dufour, V. Labatut, and J. Bonastre, Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization, 18th Annual Conference of the International Speech Communication Association, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01572151

A. Gresse, M. Quillot, R. Dufour, V. Labatut, and J. Bonastre, « Similarity Metric Based on Siamese Neural Netorks for Voice Casting, IEEE International Conference on Acoustics, Speech and Signal Processing, 2019.