,
,
,
, Chapitre 7
2.1 Le p-vecteur : une représentation de l'information caractéristique du personnage, un espace de représentation du personnage Sommaire ,
, Homogénéisation de l'information par distillation, p.118
, Distillation de la connaissance
,
,
,
, Système de recommandation automatique de voix
23 2.3 Schématisation d'un système de vérification automatique du locuteur. La modélisation permet, entre autre, d'obtenir une représentation de taille fixe à partir de la séquence de paramètres de longueur variable ,
, Schématisation d'un GMM-UBM de 4 composantes et adaptation du modèle locuteur avec la procédure MAP, p.28
, L'apprentissage des Bottleneck Features est guidé par une tâche de discrimination
, , p.33
, Modèle tridimensionnel des émotions, p.39, 1984.
, Modèle circomplexe des émotions, p.40, 1980.
Modèle en lentille de Brunswik (Brunswik 1956), vol.46 ,
, , p.72
, La Probabilistic Linear Discriminant Analysis (PLDA) permet d'estimer la similarité entre deux i-vecteurs au moyen d'un Likelihood Ratio (LR), Illustration du système de référence (A) et des deux variantes proposées dans notre approche (B et C)
, Nomenclature des fichiers de segments de voix, p.77
, Histogramme des durées des segments de voix du corpus Mass Effect 3. Ici, les segments d'une durée supérieure à 10 s (42 segments) ne sont pas représentés pour des raisons pratiques, p.78
, Les graphiques du haut illustrent les scores moyens des tests effectués sur les différents systèmes. Ceux du bas montrent leurs écarts-type respectifs, Distributions des scores moyens obtenus sur les différents systèmes en configuration FR ? FR
Méthode d'apprentissage de la matrice de projection avec neutralisation du biais linguistique. La Probabilistic Linear Discriminant Analysis (PLDA) permet d'estimer la similarité entre deux i-vecteurs au moyen d'un Likelihood Ratio (LR), p.86 ,
, Scores moyens obtenus sur les systèmes B et C pour le test de la composante linguistique en configuration FR ? FR, vol.87
, Liste des figures 6.1 Réseaux de neurones siamois prenant deux représentations i-vecteurs
La liste des personnages 1, 2, ..., 16 est auparavant mélangée. Les étiquettes (soldat, officier, extra-terrestre...) sont attribuées aux personnages selon notre propre interprétation des voix et ne font en aucun cas office de supervision ,
Représentation dans l'espace i-vecteur des personnages pour les cas A, B, C et D. Illustration obtenue avec t-SNE, vol.104 ,
, Illustration en boîte à moustaches des distances mesurées entre les paires target (bleue) et nontarget (orange) dans le cas d'évaluation C. À gauche, les mesures faites sur le corpus de développement et à droite celles effectuées sur le corpus de test, p.108
, Occurrence des personnages (impliqués dans l'évaluation C) dans les différents quartiles calculés sur les erreurs de prédictions
, , p.111
, Illustration de l'approche disjointe utilisée pour l'apprentissage du p-vecteur
, Dans cette illustration, le modèle Maître apprend à discriminer les vecteurs donnés en entrée selon différentes classes, jusqu'à ce qu'il produise les soft-target requises pour l'apprentissage du modèle Élève. Les deux modèles peuvent être entraînés sur différents jeux de données
, Représentation dans l'espace des x-vecteurs des segments de voix des différents personnages
, Projection des p-vecteurs dans un espace à deux dimensions appris avec l'algorithme t-SNE. Les axes n'ont pas de signification particulière
Comparaison de la courbe d'apprentissage du modèle de similarité en fonction du pré-entraînement, p.132 ,
, résultats obtenus avec les p-vecteurs de test sur la tâche d'appariement des voix. À gauche l'approche originale basée sur les i-vecteurs et à droite l'approche basée sur les xvecteurs (utilisés pour l'approche p-vecteur). Les losanges représentent les valeurs aberrantes
Exactitude des prédiction du système A en fonction de k selon différentes méthodes de comparaison, p.143 ,
Exactitude des prédiction du système B en fonction de k selon différentes méthodes de comparaison, p.145 ,
Exactitude des prédiction du système C en fonction de k selon différentes méthodes de comparaison, p.145 ,
, Liste des tableaux
, , p.47
, Taux de réussite des prédictions de la similarité des différents systèmes (k = 3). Tient compte des résultats cumulés sur les différents plis
, Taux de réussite des prédictions de la similarité des différents systèmes (k = 3) pour le test de la composante linguistique. Tient compte des résultats cumulés sur les différents plis, p.86
, Structure des réseaux convolutifs utilisés dans le SNN suivant la nomenclature de Keras (Chollet et al. 2015), p.105
, Taux de réussite des prédiction du modèle d'appariement
, Valeurs de la statistique du test de Student pour la discrimination des paires target et nontarget
, Comparaison des performances obtenues avec les différentes architectures
calculée sur l'analyse clustering effectuée à partir des p-vecteurs de test ,
, Mesure de la performance du classificateur de paires target et nontarget à partir des p-vecteurs de test. Les performances sur le corpus de développement (absentes du tableau) tournent généralement aux alentours de 85 % de réussite, p.131
,
, , p.2081289571, 2005.
« Perceptual similarity measurement of speech by combination of acoustic features, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4861-4864, 2008. ,
Shigeo Morishima et Satoshi Nakamura. « Perceptual similarity measurement of speech by combination of acoustic features, IEEE International Conference on Acoustics, Speech and Signal Processing ,
, IEEE, pp.4861-4864, 2008.
Features and classifiers for emotion recognition from speech : a survey from, Theodoros Iliou et Ioannis Giannoukos. «, vol.43, pp.155-177, 2000. ,
Sous la dir, de J Katzenberg. DreamWorks SKG, 2001. ,
« Domain adaptation of dnn acoustic models using knowledge distillation, International Conference on Acoustics, Speech and Signal Processing ,
« Speaker similarities in human perception and their spectral properties, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WESPAC). T. 9, 2006. ,
« Convoluted feelings convolutional and recurrent nets for detecting emotion from audio data, 2015. ,
« Deep Speaker Embeddings for Short-Duration Speaker Verification, 18th Annual Conference of the International Speech Communication Association, pp.1517-1521, 2017. ,
Game Writing : Narrative Skills for Videogames ,
, Applied English Series. Charles River Media, p.9781584504900, 2007.
A distance-normalized MAP estimation of speaker models for automatic speaker verification, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 2. IEEE, p.69, 2003. ,
« Perceptual scaling of voice identity : common dimensions for different vowels and speakers, Psychological Research PRPF, vol.74, issue.1, p.110, 2010. ,
« Representation learning : A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.1798-1828, 2013. ,
« Voice-selective areas in human auditory cortex, Nature, vol.403, p.309, 2000. ,
, Curriculum learning ». In : 26th Annual International Conference on Machine Learning, pp.41-48, 2009.
« Vocal types and stereotypes : Joint effects of vocal attractiveness and vocal maturity on person perception, Journal of Nonverbal Behavior, vol.16, pp.41-54, 1992. ,
« Thinking the voice : Neural correlates of voice perception, Trends in Cognitive Sciences, vol.8, issue.3, pp.129-135, 2004. ,
A tutorial on text-independent speaker verification, EURASIP Journal on Advances in Signal Processing, vol.4, p.101962, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-01434501
,
Cahiers de l'association française des enseignants et chercheurs en cinéma et audiovisuel, Mise au point, vol.5, 2013. ,
Driss Matrouf et Jean-François Bonastre. « Intersession compensation and scoring methods in the i-vectors space for speaker recognition, Twelfth Annual Conference of the International Speech Communication Association, 2011. ,
« Towards a more nuanced view of vocal attractiveness, PloS one 9, vol.2, 2014. ,
Age estimation from telephone speech using i-vectors, 13th Annual Conference of the International Speech Communication Association, 2012. ,
« The expression and perception of emotions : Comparing assessments of self versus others, pp.257-260, 2008. ,
« Forensic voice identification in France, Speech Communication, vol.31, pp.79-84, 2000. ,
« Person authentication by voice : A need for caution, Eighth European Conference on Speech Communication and Technology, 2003. ,
Solange Rossato et Moez Ajili. « Forensic speaker recognition : Mirages and reality, p.255, 2015. ,
« Les stars et le cinéma d'animation ». In : Mise au point. Cahiers de l'association française des enseignants et chercheurs en cinéma et audiovisuel, 2014. ,
Driss Matrouf, Jean-François Bonastre et Old?ich Plchot. « Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis, Odyssey Proceedings, 2012. ,
Ce que parler veut dire : l'économie des échanges linguistiques. Fayard, p.2213012164, 1982. ,
Eduard Säckinger et Roopak Shah. « Signature verification using a "siamese" time delay neural network », Advances in Neural Information Processing Systems, pp.737-744, 1994. ,
« Vocal attractiveness increases by averaging, Current Biology, vol.20, pp.116-120, 2010. ,
Perception and the representative design of psychological experiments, 1956. ,
The Upside. Sous la dir. de T. Black, J Blumenthal et S. Tisch. The Weinstein Company, 2017. ,
« IEMOCAP : Interactive emotional dyadic motion capture database, Language resources and evaluation, vol.42, p.335, 2008. ,
Jean-François Bonastre et Driss Matrouf. « Forensic speaker recognition, 2009. ,
« Speaker recognition : A tutorial, Proceedings of the IEEE, vol.85, pp.1437-1462, 1997. ,
voix chantée et apprentissage : une revue de littérature et quelques propositions d'exploitation en didactique de la phonétique des langues, 2014. ,
« Learning a similarity metric discriminatively, with application to face verification, IEEE Conference on Computer Vision and Pattern Recognition, pp.539-546, 2005. ,
, , 2015.
, Vox-Celeb2 : Deep Speaker Recognition ». In : 19th Annual Conference of the International Speech Communication Association, 2018.
Le vocabulaire de l'objet sonore, Ouvrages de référence, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-00807080
« Support vector machines using GMM supervectors for speaker verification, IEEE signal processing letters, vol.13, issue.5, pp.308-311, 2006. ,
« Modeling prosodic features with joint factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, pp.2095-2103, 2007. ,
The speech chain, 1993. ,
« Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, Tenth Annual Conference of the International Speech Communication Association, 2009. ,
« Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, pp.788-798, 2010. ,
« Front end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, pp.788-798, 2011. ,
« Language recognition via i-vectors and dimensionality reduction, Twelfth Annual Conference of the International Speech Communication Association, 2011. ,
« Stereotypes in video games and how they perpetuate prejudice, McNair Scholars Research Journal, vol.6, p.5, 2013. ,
« Phase distortion statistics as a representation of the glottal source : Application to the classification of voice qualities, 14th Annual Conference of the International Speech Communication Association, 2013. ,
The expression of the emotions in man and animals, p.1872 ,
« Video game characters and the socialization of gender roles : Young people's perceptions mirror sexist media depictions, Sex roles, vol.57, pp.851-864, 2007. ,
,
« Survey on speech emotion recognition : Features, classification schemes, and databases, Pattern Recognition, vol.44, pp.572-587, 2011. ,
« Survey on speech emotion recognition : Features, classification schemes, and databases, Pattern Recognition, vol.44, pp.572-587, 2011. ,
« Basic emotions, Handbook of cognition and emotion 98, p.16, 1999. ,
« Voice similarity among brothers : evidence from a perception experiment, 21st Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), 2012. ,
« Speaker independent emotion recognition based on SVM/HMMs fusion system, International Conference on Audio, Language and Image Processing (ICASSP). IEEE, pp.61-65, 2008. ,
« The world of emotions is not two-dimensional, Psychological science, vol.18, pp.1050-1057, 2007. ,
« A comparison of approaches for modeling prosodic features in speaker recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2010, pp.4414-4417 ,
« Cepstral analysis technique for automatic speaker verification, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.29, pp.254-272, 1981. ,
« La traduction audiovisuelle : un genre en expansion, Meta : journal des traducteurs/Meta : Translators' Journal, vol.49, pp.1-11, 2004. ,
, Thirteenth International Conference on Artificial Intelligence and Statistics. Sous la dir. d'Yee Whye Teh et Mike Titterington. T. 9. PMLR, pp.249-256, 2010.
,
« Speaker verification with shifted delta cepstral features : Its Pseudo-Prosodic Behaviour, First Iberian SLTech, 2009. ,
« Analysis of i-vector length normalization in speaker recognition systems, Twelfth Annual Conference of the International Speech Communication Association, 2011. ,
Acting and Character Animation : The Art of Animated Films, Acting and Visualizing, p.1498778631, 2017. ,
« Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE transactions on speech and audio processing, vol.2, pp.291-298, 1994. ,
, Les précurseurs corporels et comportementaux du langage verbal, 2005.
Maja Pantic et Roddy Cowie. « Emotion representation, analysis and synthesis in continuous space : A survey, Face and Gesture, pp.827-834, 2011. ,
Reconnaissance automatique de la parole : Du Signal à son Interprétation. Dunod, 2006. ,
« Dimensionality reduction by learning an invariant mapping, Computer vision and pattern recognition, pp.1735-1742, 2006. ,
« Multi-class and hierarchical SVMs for emotion recognition, 11th Annual Conference of the International Speech Communication Association, 2010. ,
« Ratings of voice attractiveness predict sexual behavior and body configuration, Evolution and Human Behavior, vol.25, pp.295-304, 2004. ,
« Etude de la source glottique en voix parlée et chantée : modélisation et estimation, mesures acoustiques et électroglottographiques, perception, vol.6, 2001. ,
« Experimental studies of the elements of expression in music, American journal of Psychology, vol.48, pp.246-268, 1936. ,
,
, « Voice correlates of mating success in men : examining "contests" versus "mate choice" modes of sexual selection, Archives of Sexual Behavior, vol.40, pp.551-557, 2011.
« Speaker recognition by machines and humans : A tutorial review, IEEE Signal processing magazine, vol.32, pp.74-99, 2015. ,
Sachin Kajarekar et Andreas Stolcke. « Withinclass Covariance Normalization for SVM-based Speaker Recognition, Proceedings of ICSLP, pp.1471-1474, 2006. ,
, Encyclopedia of Evolutionary Psychological Science, pp.1-5, 2016.
« Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5866-5870, 2019. ,
, Oriol Vinyals et Jeffrey Dean. « Distilling the Knowledge in a Neural Network, 2014.
Similar speaker selection technique based on distance metric learning using highly correlated acoustic features with perceptual voice quality similarity, IEICE TRANSACTIONS on Information and Systems, vol.98, pp.157-165, 2015. ,
The Lara phenomenon : Powerful female characters in video games, Sex roles, vol.56, pp.141-148, 2007. ,
Srinivasan Umesh et Basil Abraham. « Generalized distillation framework for speaker normalization, 18th Annual Conference of the International Speech Communication Association, 2017. ,
« The effects of emotions on voice quality, 14th International Congress of Phonetic Sciences, pp.2029-2032, 1999. ,
,
« Speaker verification by inexperienced and experienced listeners vs. speaker verification system, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5912-5915, 2011. ,
« Language dependence in multilingual speaker verification, Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa. PRASA, 2005. ,
« Disentangling speaker and channel effects in speaker verification, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.37-40, 2004. ,
Jonas Lindh et Joel Åkesson. « Identifying perceptually similar voices with a speaker recognition system using auto-phonetic features, 17th Annual Conference of the International Speech Communication Association, pp.1567-1568, 2016. ,
Vishwa Gupta et Md Jahangir Alam. « Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition, Odyssey Proceedings, pp.293-298, 2014. ,
« Joint factor analysis of speaker and session variability : Theory and algorithms-technical report, Montreal, CRIM 2005, 2005. ,
« Bayesian speaker verification with heavy-tailed priors, Odyssey Proceedings, 2010. ,
, Voiceprint identification, vol.34, pp.725-725, 1962.
Robin Freyberg et Ezequiel Morsella. « Inferring speakers' physical attributes from their voices, Journal of Experimental Social Psychology, vol.38, pp.510-513, 2002. ,
« Users of the world, unite ! The challenges and opportunities of Social Media, Business horizons, vol.53, pp.59-68, 2010. ,
Haizhou Li et Eng Siong Chng. « Temporal discrete cosine transform : Towards longer term temporal features for speaker verification, Fifth International Symposium on Chinese Spoken Language Processing, pp.547-558, 2006. ,
,
« Everyday expressions associated with voice quality of normal utterance-Extraction by perceptual evaluation, Acoustic Society of Japan, vol.57, pp.337-344, 2001. ,
« New MAP estimators for speaker recognition, Eighth European Conference on Speech Communication and Technology, 2003. ,
Masculine men articulate less clearly, Human Nature, vol.24, pp.461-475, 2013. ,
« Emotion recognition from speech : a review, International journal of speech technology, vol.15, pp.99-117, 2012. ,
« Reconsidering the nature of voice, The Journal of the Acoustical Society of America, vol.144, pp.1765-1765, 2018. ,
Metric learning : A survey, pp.287-364, 2013. ,
« Siamese neural networks for one-shot image recognition ». Mém. de mast, 2015. ,
The phonetic description of voice quality : Cambridge Studies in Linguistics, p.521231760, 1980. ,
« Human voice perception, Current Biology, vol.21, pp.143-145, 2011. ,
Une anthropologie des voix. Métailié, p.2864248425, 2011. ,
« Voice similarity-a comparison between judgements by human listeners and automatic voice comparison, Proceedings from FONETIK, pp.63-69, 2010. ,
Nouvelles méthodes pour la détermination des orbites des comètes, F. Didot, p.1805 ,
Nouvelles méthodes pour la détermination des orbites des comètes, F. Didot, p.1805 ,
« A novel scheme for speaker recognition using a phoneticallyaware deep neural network », 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1695-1699, 2014. ,
,
Ajay Kannan et Zhenyao Zhu. « Deep Speaker : an End-to-End Neural Speaker Embedding System, 2017. ,
Rui Zhao et Yifan Gong. « Large-scale domain adaptation via teacher-student learning, 2017. ,
« Combining classifiers with diverse feature sets for robust speaker independent emotion recognition, 17th European Signal Processing Conference, pp.1225-1229, 2009. ,
« Feature analysis and evaluation for automatic emotion identification in speech, IEEE Transactions on Multimedia, vol.12, pp.490-501, 2010. ,
« A forensic phonetic investigation into the speech patterns of identical and non-identical twins, International Journal of Speech, Language and the Law, vol.15, pp.97-100, 2008. ,
Bernhard Schölkopf et Vladimir Vapnik. « Unifying distillation and privileged information, International Conference on Learning Representations, 2016. ,
Luciana Ferrer et Nicolas Scheffer. « iVector-based prosodic system for language identification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4861-4864, 2012. ,
Bonastre et Salah Eddine Mezaache. « Factor analysis multi-session training constraint in session compensation for speaker verification, Ninth Annual Conference of the International Speech Communication Association, 2008. ,
Automatic speech emotion recognition using recurrent neural networks with local attention, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2227-2231 ,
The Five-Factor Model of personality traits : consensus and controversy. Sous la dir, pp.148-161, 2009. ,
,
« Assessing perceived voice similarity using Multidimensional Scaling for the construction of voice parades, International Journal of Speech, vol.20, issue.2, 2013. ,
« Listeners' perception of voice similarity in Standard Southern British English versus York English, 23rd Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), 2014. ,
, , 2009.
, 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp.4814-4818, 2015.
« Exploiting similarities among languages for machine translation, 2013. ,
« Robust Speech Recognition Using Generalized Distillation Framework, 17th Annual Conference of the International Speech Communication Association, 2016. ,
« Long-term feature averaging for speaker recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.25, pp.330-337, 1977. ,
« From speech to personality : Mapping voice quality and intonation into personality differences, 20th ACM International Conference on Multimedia, pp.789-792, 2012. ,
, ACM SIGGRAPH 2005 Sketches, p.20, 2005.
« Phonological learning by children and adults in a laboratory setting, Language and Speech, vol.22, pp.145-149, 1979. ,
Automatic personality perception : Prediction of trait attribution based on prosodic features extended abstract, pp.273-284 ,
Automatic attribution of personality traits based on prosodic features, 2012. ,
Automatic Attribution of Personality Traits Based on Prosodic Features, ACII 2015 Affective Computing and Intelligent Interaction, vol.3, pp.29-32, 2015. ,
« The voice of personality : Mapping nonverbal vocal behavior into trait attributions, Proceedings of the 2nd International Workshop on Social Signal Processing, pp.17-20, 2010. ,
Kjell Elenius et Kornel Laskowski. « Emotion recognition in spontaneous speech using GMMs, Ninth International Conference on Spoken Language Processing, 2006. ,
« Speech emotion recognition using hidden Markov models, Speech communication, vol.41, pp.99-101, 2003. ,
« Does computer-synthesized speech manifest personality ? Experimental tests of recognition, similarity-attraction, and consistency-attraction, Journal of experimental psychology : applied, vol.7, p.171, 2001. ,
« Some Acoustic Correlates of Perceived (Dis) Similarity between Sameaccent Voices, ICPhS, pp.1506-1509, 2011. ,
« Identical twins, different voices, International Journal of Speech, Language and the Law, vol.3, pp.39-49, 1996. ,
Gea De Jong et Toby Hudson. « The DyViS database : style-controlled recordings of 100 homogeneous speakers for forensic phonetic research, International Journal of Speech, vol.16, issue.1, 2009. ,
Louisa Stevens et Toby Hudson. « The role of voice quality 'settings' in perceived voice similarity, International Association for Forensic Phonetics and Acoustics, 2011. ,
« Similarity search of acted voices for automatic voice casting, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, pp.1642-1651, 2016. ,
,
« On automatic voice casting for expressive speech : Speaker recognition vs. speech classification, pp.950-954, 2014. ,
« On automatic voice casting for expressive speech : Speaker recognition vs. speech classification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.950-954, 2014. ,
« A sensitive period for the acquisition of a nonnative phonological system, Journal of psycholinguistic research, vol.5, pp.261-283, 1976. ,
« An empirical comparison of rhythm in language and music, Cognition, vol.87, issue.02, pp.187-194, 2003. ,
« Probabilistic linear discriminant analysis for inferences about identity, 2007 IEEE 11th International Conference on Computer Vision. IEEE, pp.1-8, 2007. ,
Affective computing, 2000. ,
« Vocal indicators of body size in men and women : a meta-analysis, Animal Behaviour, vol.95, pp.89-99, 2014. ,
« Wise teachers train better DNN acoustic models, Speech, and Music Processing, 2016. ,
« Long-term memory for unfamiliar voices, The Journal of the Acoustical Society of America, vol.85, pp.913-925, 1989. ,
« La commodification des voix au cinéma : un outil de différentiation et de stigmatisation langagière, Entrelacs. Cinéma et audiovisuel, p.11, 2014. ,
Emotions : A General Psychoevolutionary Theory, pp.197-219, 1984. ,
,
Lukas Burget et Ondrej Glembek. « The Kaldi speech recognition toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding, 2011. ,
Georg Stemmer et Karel Vesely. « The Kaldi Speech Recognition Toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011. ,
Emotionoriented systems : the HUMAINE handbook, pp.978-981, 2011. ,
« Intrasexual competition among women : Vocal femininity affects perceptions of attractiveness and flirtatiousness, Personality and Individual Differences, vol.50, pp.111-115, 2011. ,
« The role of F0 variation in the intelligibility of Mandarin sentences, Speech Prosody Fifth International Conference, 2010. ,
La voix et soi : Ce que notre voix dit de nous, p.2353272312, 2013. ,
Comparison of background normalization methods for text-independent speaker verification, Fifth European Conference on Speech Communication and Technology, 1997. ,
On the perception of similarity among talkers, The Journal of the Acoustical Society of America, vol.122, pp.3688-3696, 2007. ,
Neil Zeghidour, Thomas Schatz et Emmanuel Dupoux. « Sampling strategies in Siamese Networks for unsupervised speech representation learning, 19th Annual Conference of the International Speech Communication Association, 2018. ,
Donna Erickson et Véronique Aubergé. « Multimodal indices to Japanese and French prosodically expressed social affects, Language and speech, vol.52, pp.223-243, 2009. ,
« Social affect production and perception across languages and cultures-the role of prosody, Leitura, vol.2, pp.15-41, 2013. ,
,
« On the varying reception of speakers expressivity across gender and cultures, and inference in their personalities, Sonorities : speech, singing and reciting expressivity, pp.149-163, 2016. ,
« Smartphones : An emerging tool for social scientists, Sociological methods & research, vol.37, pp.426-454, 2009. ,
« Differences and distinguishability in the acoustic characteristics of Hello in voices of similar-sounding speakers, Australian Review of Applied Linguistics, vol.22, pp.1-42, 1999. ,
Waad Ben Kheder, Driss Matrouf et Jean-François Bonastre. « LIA system description for NIST SRE, 2016. ,
,
, « Speaker verification using adapted Gaussian mixture models, Digital signal processing, vol.10, pp.19-41, 2000.
« Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing, vol.3, issue.1, pp.72-83, 1995. ,
A circumplex model of affect, Journal of personality and social psychology, vol.39, p.1161, 1980. ,
Automatic assessment of voice quality according to the GRBAS scale, International Conference of the IEEE Engineering in Medicine and Biology Society, pp.2478-2481, 2006. ,
« Cluster analysis of voice quality ratings : Identifying groups of perceptually similar speakers, pp.173-176, 2018. ,
Kandarpa Kumar Sarma et Najim Dehak. « Emotion Identification from Raw Speech Signals Using DNNs, 19th Annual Conference of the International Speech Communication Association, pp.3097-3101, 2018. ,
,
« Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations, Proceedings of the National Academy of Sciences, vol.107, pp.2408-2412, 2010. ,
« A study on the roles of total variability space and session variability modeling in speaker recognition, International Journal of Speech Technology, vol.19, pp.111-120, 2016. ,
« Emotion inferences from vocal expression correlate across languages and cultures, Journal of Cross-cultural psychology, vol.32, pp.76-92, 2001. ,
Christian Müller et Shrikanth S Narayanan. « The interspeech 2010 paralinguistic challenge, Eleven Annual Conference of the International Speech Communication Association, 2010. ,
Elizabeth Shriberg et Andreas Stolcke. « The SRI NIST 2010 speaker recognition evaluation system, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.5292-5295, 2011. ,
« Recognising realistic emotions and affect in speech : State of the art and lessons learnt from the first challenge, Speech Communication, vol.53, pp.1062-1087, 2011. ,
Florian Schiel et Jarek Krajewski. « The insterspeech 2011 speaker state challenge, Twelve Annual Conference of the International Speech Communication Association, 2011. ,
« The interspeech 2012 speaker trait challenge, Thirteenth Annual Conference of the International Speech Communication Association, 2012. ,
Christer Gobl et Friedhelm Schwenker. « Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification, Computer Speech & Language, vol.27, pp.263-287, 2013. ,
,
, Speaker perception, vol.5, pp.15-25, 2014.
« Vocal cues in emotion encoding and decoding, Motivation and emotion, vol.15, pp.123-148, 1991. ,
« Vocal communication of emotion : A review of research paradigms, Speech Communication, vol.40, pp.84-89, 2003. ,
« Vocal communication of emotion : A review of research paradigms, Speech Communication, vol.40, pp.84-89, 2003. ,
« The dynamic architecture of emotion : Evidence for the component process model, Cognition and emotion, vol.23, pp.1307-1351, 2009. ,
« Voice quality analysis of American and German speakers, Journal of Psycholinguistic Research, vol.3, issue.3, pp.281-298, 1974. ,
« Personality inference from voice quality : The loud voice of extroversion, European Journal of Social Psychology, vol.8, pp.467-487, 1978. ,
, Cahiers De Psychologie Cognitive/Current Psychology Of Cognition, 1984.
« Adaptations in humans for assessing physical strength from the voice, Proceedings of the Royal Society B : Biological Sciences, vol.277, pp.3509-3518, 2010. ,
« Dimensional models of core affect : A quantitative comparison by means of structural equation modeling, European Journal of Personality, vol.14, pp.325-345, 2000. ,
, M/c journal, vol.8, p.26, 2005.
« Recognizing famous voices : Influence of stimulus duration and different types of retrieval cues, Journal of Speech, Language, and Hearing Research, vol.40, pp.453-463, 1997. ,
« A simplified vocal profile analysis protocol for the assessment of voice quality and speaker similarity, Journal of Voice, vol.31, pp.644-655, 2017. ,
Yishay Carmiel et Sanjeev Khudanpur. « Deep neural network-based speaker embeddings for end-to-end speaker verification, 2016 IEEE Spoken Language Technology Workshop (SLT), pp.165-170, 2016. ,
« Deep Neural Network Embeddings for Text-Independent Speaker Verification, 18th Annual Conference of the International Speech Communication Association, 2017. ,
Daniel Povey et Sanjeev Khudanpur. « X-vectors : Robust dnn embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing ,
, IEEE, pp.5329-5333, 2018.
Report : A vector quantization approach to speaker recognition, AT&T technical journal, vol.66, pp.14-26, 1987. ,
Ilya Sutskever et Ruslan Salakhutdinov. « Dropout : A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014. ,
« The case for automatic higher-level features in forensic speaker recognition, Ninth Annual Conference of the International Speech Communication Association, 2008. ,
« Video game characters. Theory and analysis, Diegesis 3.1, 2014. ,
,
« Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks », 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1532-1536 ,
Le Monde de Némo. Sous la dir. de G Walters, 2003. ,
« Speech emotion recognition using support vector machine, 5th International Conference on Knowledge and Smart Technology (KST) ,
, IEEE, pp.86-91, 2013.
« Analyse acoustique de la voix émotionnelle de locuteurs lors d'une interaction humain-robot, 2012. ,
« Voices in Japanese Animation : How People Perceive the Voices of Good Guys and Bad Guys, Working Papers of the Linguistics Circle, vol.17, pp.149-158, 2003. ,
« Random splicing : A method of investigating the effects of voice quality on impression formation, Speech Prosody 2004, International Conference, 2004. ,
L'évaluation instrumentale des dysphonies. Etat actuel et perspectives, 2004. ,
Gelareh Mohammadi et Alessandro Vinciarelli. « On speaker-independent personality perception and prediction from speech, 13th Annual Conference of the International Speech Communication Association, 2012. ,
, Sous la dir. de N Duval Adassovsky, Y Zenou et L Zeitoun. TF1 Film Production, 2011.
Bjorn Schuller et Stefanos Zafeiriou. « Adieu features ? End-to-end speech emotion recognition using a deep convolutional recurrent network, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). T, pp.5200-5204, 2016. ,
« Speech-based recognition of self-reported and observed emotion in a dimensional space, Speech Communication, vol.54, pp.1049-1063, 2012. ,
,
« Endto-end speech emotion recognition using deep neural networks », IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5089-5093, 2018. ,
« Deep neural networks for small footprint text-dependent speaker verification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4052-4056, 2014. ,
, « Learning Using Privileged Information : Similarity Control and Knowledge Transfer, vol.16, pp.2023-2049, 2015.
« Familiar voice recognition : Patterns and parameters : I. Recognition of backward voices, Journal of phonetics, 1985. ,
« A survey of personality computing, IEEE Transactions on Affective Computing, vol.5, pp.273-291, 2014. ,
« Learning fine-grained image similarity with deep ranking, IEEE Conference on Computer Vision and Pattern Recognition, pp.1386-1393, 2014. ,
« Student-teacher network learning with enhanced features, International Conference on Acoustics, Speech and Signal Processing ,
« Women's faces and voices are cues to reproductive potential in industrial and forager societies, Evolution and Human Behavior, vol.35, pp.264-271, 2014. ,
« Efficient acoustic parameters for speaker recognition, The Journal of the Acoustical Society of America, vol.51, pp.2044-2056, 1972. ,
Sous la dir. de L Forte, 2002. ,
« Using i-vector space model for emotion recognition, Thirteenth Annual Conference of the International Speech Communication Association, 2012. ,
Jason Pelecanos et Ruhi Sarikaya. « Bottleneck features for speaker recognition, Odyssey Proceedings, 2012. ,
« Improvement of distant-talking speaker identification using bottleneck features of DNN, 14th Annual Conference of the International Speech Communication Association, pp.3661-3664, 2013. ,
Mats Blomberg et Daniel Elenius. « A comparison between human perception and a speaker verification system score of a voice imitation, Tenth Australian International Conference on Speech Science & Technology, pp.393-397, 2004. ,
« What sounds beautiful is good : The vocal attractiveness stereotype, Journal of Nonverbal Behavior, vol.13, pp.67-82, 1989. ,
« What sounds beautiful is good : The vocal attractiveness stereotype, Journal of Nonverbal Behavior, vol.13, pp.67-82, 1989. ,
Nicolas Usunier et Emmanuel Dupoux. « Joint learning of speaker and phonetic similarities with siamese networks, 17th Annual Conference of the International Speech Communication Association, pp.1295-1299, 2016. ,
Maarten Versteegh et Emmanuel Dupoux. « A deep scattering spectrum-Deep Siamese network pipeline for unsupervised acoustic modeling, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.4965-4969, 2016. ,
« ADADELTA : an adaptive learning rate method, 2012. ,
, The Attractive Voice : What Makes It So ? » In : Health, vol.17, 1993.
« Voice disguise and automatic speaker recognition, Forensic science international, vol.175, pp.118-122, 2008. ,
« An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, vol.11, pp.684-694, 2017. ,
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization, 18th Annual Conference of the International Speech Communication Association, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01572151
« Similarity Metric Based on Siamese Neural Netorks for Voice Casting, IEEE International Conference on Acoustics, Speech and Signal Processing, 2019. ,