.. Analysis-by-synthesis-loop, 123 xix List of Tables 3.1 Datasets for affect recognition systems, p.39

R. Statistical-mappingalessandro, J. Tilmanne, and M. Astrinaki, Towards the Sketching of Performative Control with Data, 2013.

. Ahn, Asymmetric facial expressions: revealing richer emotions for embodied conversational agents, Computer Animation and Virtual Worlds, vol.61, issue.12, pp.24539-551, 2013.
DOI : 10.1002/cav.1539
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.364.5796

. Anderson, Expressive Visual Text-to-Speech Using Active Appearance Models, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3382-3389, 2013.
DOI : 10.1109/CVPR.2013.434
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.9709

B. Aubergé, V. Aubergé, and G. Bailly, Generation of intonation: a global approach, EUROSPEECH, 1995.

. Bailly, Automatic labeling of large prosodic databases: Tools, methodology and links with a text-to-speech system, The ESCA Workshop on Speech Synthesis, pp.77-86, 1991.

. Bailly, Audiovisual speech synthesis, International Journal of Speech Technology, vol.6, issue.4, pp.331-346, 2003.
DOI : 10.1023/A:1025700715107
URL : https://hal.archives-ouvertes.fr/hal-00169556

G. Bailly, G. Bailly, and I. Gorisch, Generating german intonation with a trainable prosodic model, Interspeech, pp.2366-2369, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00366490

H. Bailly, G. Bailly, and B. Holm, SFC: A trainable prosodic model, Speech Communication, vol.46, issue.3-4, pp.348-364, 2005.
DOI : 10.1016/j.specom.2005.04.008
URL : https://hal.archives-ouvertes.fr/hal-00416724

. Bailly, Gaze, conversational agents and face-to-face communication, Speech Communication, vol.52, issue.6, pp.52598-612, 2010.
DOI : 10.1016/j.specom.2010.02.015
URL : https://hal.archives-ouvertes.fr/hal-00480335

P. Barbosa, Caractérisation et génération automatique de la structuration rythmique du français, 1994.

B. Barbosa, P. A. Barbosa, and G. Bailly, Generation of Pauses Within the z-score Model, Progress in Speech Synthesis, pp.365-381, 1997.
DOI : 10.1007/978-1-4612-1894-4_30

. Barbulescu, Audio-visual speaker conversion using prosody features, International Conference on Auditory-Visual Speech Processing, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00842928

S. Cohen, Mind reading: the interactive guide to emotions, 2003.

. Baron-cohen, Is There a "Language of the Eyes"? Evidence from Normal Adults, and Adults with Autism or Asperger Syndrome, Visual Cognition, vol.31, issue.3, pp.311-331, 1997.
DOI : 10.1111/j.1469-7610.1986.tb00189.x

. Bentivoglio, Analysis of blink rate patterns in normal subjects, Movement Disorders, vol.16, issue.6, pp.1028-1034, 1997.
DOI : 10.1002/mds.870120629

C. Berndt, D. J. Berndt, and J. Clifford, Using dynamic time warping to find patterns in time series, KDD workshop, pp.359-370, 1994.

. Beskow, Visual correlates to prominence in several expressive modes, INTERSPEECH. Citeseer, 2006.

. Black, . Hunt, A. W. Black, and A. J. Hunt, Generating F/sub 0/ contours from ToBI labels using linear regression, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.1385-1388, 1996.
DOI : 10.1109/ICSLP.1996.607872

P. Boersma, Praat, a system for doing phonetics by computer, pp.341-345, 2002.

. Bogert, The frequency analysis of time series for echoes: cepstrum, pseudo-auto covariance, cross-cepstrum, and shaft cracking, Proceedings of the Symposium on Time Series Analysis (M. Rosenblatt, pp.209-243

D. Bolinger, Intonation and its uses: Melody in grammar and discourse, 1989.

. Bregler, Video Rewrite, Proceedings of the 24th annual conference on Computer graphics and interactive techniques , SIGGRAPH '97, pp.353-360, 1997.
DOI : 10.1145/258734.258880

. Bulut, Expressive speech synthesis using a concatenative synthesizer, 2002.

. Busso, Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, pp.335-359, 2008.
DOI : 10.1007/s10579-008-9076-6

. Busso, Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis, Audio, Speech, and Language Processing, pp.151075-1086, 2007.
DOI : 10.1109/TASL.2006.885910

. Busso, Natural head motion synthesis driven by acoustic prosodic features, Computer Animation and Virtual Worlds, vol.25, issue.3-4, pp.3-4283, 2005.
DOI : 10.1002/cav.80

. Busso, . Narayanan, C. Busso, and S. S. Narayanan, Interrelation between speech and facial gestures in emotional utterances: a single subject study. Audio, Speech, and Language Processing, IEEE Transactions on, issue.8, pp.152331-2347, 2007.

. Cafaro, Representing Communicative Functions in SAIBA with a Unified Function Markup Language, Intelligent Virtual Agents, pp.81-94, 2014.
DOI : 10.1007/978-3-319-09767-1_11

J. E. Cahn, Generating expression in synthesized speech, 1989.

. Calinon, S. Billard-]-calinon, and A. Billard, Incremental learning of gestures by imitation in a humanoid robot, Proceeding of the ACM/IEEE international conference on Human-robot interaction , HRI '07, pp.255-262, 2007.
DOI : 10.1145/1228716.1228751

W. N. Campbell, Syllable-based segmental duration. Talking machines: Theories, models, and designs, pp.211-224, 1992.

. Cao, Real-time speech motion synthesis from recorded motions, Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation , SCA '04, pp.345-353, 2004.
DOI : 10.1145/1028523.1028570

. Cao, Unsupervised learning for speech motion editing, Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp.225-231, 2003.

. Cao, Expressive speech-driven facial animation, ACM Transactions on Graphics, vol.24, issue.4, pp.1283-1302, 2005.
DOI : 10.1145/1095878.1095881
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.86.7008

. Cassell, Animated conversation, Proceedings of the 21st annual conference on Computer graphics and interactive techniques , SIGGRAPH '94, pp.413-420, 1994.
DOI : 10.1145/192161.192272

. Cassell, BEAT, Proceedings of the 28th annual conference on Computer graphics and interactive techniques , SIGGRAPH '01, pp.163-185, 2004.
DOI : 10.1145/383259.383315

. Cavé, About the relationship between eyebrow movements and Fo variations, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.2175-2178, 1996.
DOI : 10.1109/ICSLP.1996.607235

E. Chuang and C. Bregler, Performance driven facial animation using blendshape interpolation, Computer Science Technical Report, vol.2, issue.2, p.3, 2002.

B. Chuang, E. Chuang, and C. Bregler, Mood swings: expressive speech animation, ACM Transactions on Graphics, vol.24, issue.2, pp.331-347, 2005.
DOI : 10.1145/1061347.1061355
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.3306

M. Cohen, M. M. Cohen, and D. W. Massaro, Modeling Coarticulation in Synthetic Visual Speech, Models and techniques in computer animation, pp.139-156, 1993.
DOI : 10.1007/978-4-431-66911-1_13
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.1587

. Cowie, Beyond emotion archetypes: Databases for emotion modelling using neural networks, Neural Networks, vol.18, issue.4, pp.371-388, 2005.
DOI : 10.1016/j.neunet.2005.03.002

. Cvejic, Prosody for the eyes: quantifying visual prosody using guided principal component analysis, INTERSPEECH, pp.1433-1436, 2010.

C. Darwin, Expression of emotion in man and animals, 1872.

. Davis, The stability of mouth movements for multiple talkers over multiple sessions, Proceedings of the 2015 FAAVSP, 2015.

[. Moraes, Multimodal perception and production of attitudinal meaning in brazilian portuguese, Proc. Speech Prosody, paper, p.340, 2010.

S. De-tournemire-]-de-tournemire, Recherche d'une stylisation extrême des contours de f0 en vue de leur apprentissage automatique, pp.75-80, 1994.

P. Debevec, The light stages and their applications to photoreal digital actors, 2012.

. Deng, Audiobased head motion synthesis for avatar-based telepresence systems, Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence, pp.24-30, 2004.
DOI : 10.1145/1026776.1026784
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6354

. Ding, Speech-driven eyebrow motion synthesis with contextual Markovian models, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3756-3760, 2013.
DOI : 10.1109/ICASSP.2013.6638360
URL : https://hal.archives-ouvertes.fr/hal-01215185

. Dunbar, Human conversational behavior, Human Nature, vol.22, issue.3, pp.231-246, 1997.
DOI : 10.1007/BF02912493

T. Dutoit, An introduction to text-to-speech synthesis, 1997.
DOI : 10.1007/978-94-011-5730-8

. Eide, A corpus-based approach to expressive speech synthesis, Fifth ISCA Workshop on Speech Synthesis, 2004.

P. Ekman, Cross-cultural studies of facial expressions. Darwin and facial expression: A century of research in review, pp.169-229, 1973.

P. Ekman, An argument for basic emotions, Cognition & Emotion, vol.6, issue.3, pp.3-4169, 1992.
DOI : 10.1080/02699939208411068
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.454.1984

F. Ekman, P. Ekman, and W. V. Friesen, Constants across cultures in the face and emotion., Journal of Personality and Social Psychology, vol.17, issue.2, p.124, 1971.
DOI : 10.1037/h0030377

F. Ekman, P. Ekman, and W. V. Friesen, Facial action coding system, 1977.

. Ekman, P. Scherer-]-ekman, and K. Scherer, Expression and the nature of emotion, Approaches to emotion, vol.3, pp.19-344, 1984.

P. Ezzat, T. Ezzat, and T. Poggio, Videorealistic talking faces: A morphing approach, Audio-Visual Speech Processing: Computational & Cognitive Science Approaches, pp.141-144, 1997.

. Fanelli, A 3-d audio-visual corpus of affective communication. Multimedia, IEEE Transactions on, vol.12, issue.6, pp.591-598, 2010.

. Fónagy, Clichés mélodiques. Folia linguistica, pp.1-4153, 1983.

. Fukayama, Messages embedded in gaze of interface agents --- impression management with agent's gaze, Proceedings of the SIGCHI conference on Human factors in computing systems Changing our world, changing ourselves, CHI '02, pp.41-48, 2002.
DOI : 10.1145/503376.503385

. Gagneré, La simulation du travail théâtral et sa " notation " informatique, 2012.

. Graf, Visual prosody: facial movements accompanying speech, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp.396-401, 2002.
DOI : 10.1109/AFGR.2002.1004186
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.61

. Granström, . House, B. Granström, and D. House, Audiovisual representation of prosody in expressive speech communication, Speech Communication, vol.46, issue.3-4, pp.473-484, 2005.
DOI : 10.1016/j.specom.2005.02.017

. Granström, Multimodal feedback cues in human-machine interactions, Speech Prosody 2002, International Conference, 2002.

. Haan, An anatomy of dutch question intonation, Linguistics in the Netherlands, pp.97-108, 1997.

V. V. Heuven, Introducing prosodic phonetics. Experimental studies of indonesian prosody, pp.1-26, 1994.

. Heylen, The Next Step towards a Function Markup Language, Intelligent Virtual Agents, pp.270-280, 2008.
DOI : 10.1007/978-3-540-85483-8_28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.468.5534

. Hönemann, Classification of auditory-visual attitudes in german, International Conference on Auditory- Visual Speech Processing, 2015.

. Hofer, Speech-driven head motion synthesis based on a trajectory model, 2007.

. Hyviirinen, Independent component analysis, 2001.

. Ichim, Dynamic 3D avatar creation from hand-held video input, ACM Transactions on Graphics, vol.34, issue.4, p.45, 2015.
DOI : 10.1145/2766974
URL : https://infoscience.epfl.ch/record/210237/files/avatars_sg2015_paper.pdf

C. E. Izard, The psychology of emotions, 1991.
DOI : 10.1007/978-1-4899-0615-1

J. Jimenez, Separable subsurface scattering and photorealistic eyes rendering. Presented at Advances in reatime rendering in games course at ACM Siggraph, 2012.
DOI : 10.1111/cgf.12529

[. Hart, C. Hart, R. C. Cohen, and A. , A perceptual study of intonation, 1990.
DOI : 10.1017/CBO9780511627743

W. L. Johnson, Dramatic expression in opera, and its implications for conversational agents, 2003.

. Kaulard, The MPI Facial Expression Database ??? A Validated Database of Emotional and Conversational Facial Expressions, PLoS ONE, vol.25, issue.5, p.32321, 2012.
DOI : 10.1371/journal.pone.0032321.s002

H. Kawahara, STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, Acoustical Science and Technology, vol.27, issue.6, pp.349-353, 2006.
DOI : 10.1250/ast.27.349

A. Kendon, Do gestures communicate? a review. Research on language and social interaction, pp.175-200, 1994.
DOI : 10.1207/s15327973rlsi2703_2

. Kopp, Towards a Common Framework for Multimodal Generation: The Behavior Markup Language, Intelligent virtual agents, pp.205-217, 2006.
DOI : 10.1007/11821830_17

. Krahmer, Pitch, eyebrows and the perception of focus, Speech Prosody 2002, International Conference, 2002.

E. Krahmer and M. Swerts, Audiovisual Prosody?Introduction to the Special Issue, Language and Speech, vol.36, issue.2, pp.129-133, 2009.
DOI : 10.1177/0023830909103164

. Le, Live speech driven head-andeye motion generators. Visualization and Computer Graphics, IEEE Transactions on, issue.11, pp.181902-1914, 2012.
DOI : 10.1109/tvcg.2012.74

S. Le-maguer, ´ Evaluation expérimentale d'un système statistique de synthèse de la parole, HTS, pour la langue française, 2013.

. Lee, Thoughts on fml: Behavior generation in the virtual human communication architecture, Proceedings of FML, pp.83-95, 2008.

. Lee, Eyes alive, In ACM Transactions on Graphics, vol.21, pp.637-644, 2002.
DOI : 10.1145/566654.566629

. Lee, Realistic modeling for facial animation, Proceedings of the 22nd annual conference on Computer graphics and interactive techniques , SIGGRAPH '95, pp.55-62, 1995.
DOI : 10.1145/218380.218407
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.9127

. Levine, Gesture controllers, In ACM Transactions on Graphics, vol.29, p.124, 2010.
DOI : 10.1145/1833351.1778861

. Levine, Real-time prosodydriven synthesis of body language, ACM Transactions on Graphics, issue.5, p.28172, 2009.

O. Liu, K. Liu, and J. Ostermann, Realistic facial expression synthesis for an image-based talking head, 2011 IEEE International Conference on Multimedia and Expo, pp.1-6, 2011.
DOI : 10.1109/ICME.2011.6011835

. Lucey, The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshops, pp.94-101, 2010.
DOI : 10.1109/CVPRW.2010.5543262

D. Ma, X. Ma, and Z. Deng, Natural eye motion synthesis by modeling gaze-head coupling, Virtual reality conference, pp.143-150, 2009.

. Mac, Audiovisual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus, Tenth Annual Conference of the International Speech Communication Association, pp.2263-2266, 2009.

. Mac, Modeling the prosody of vietnamese attitudes for expressive speech synthesis, SLTU, pp.114-118, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00959133

N. Magnenat-thalmann and D. Thalmann, Handbook of virtual humans, 2005.
DOI : 10.1002/0470023198

N. Magnenat-thalmann and D. Thalmann, Virtual humans: thirty years of research, what next? The Visual Computer, pp.997-1015, 2005.

M. Malcangi, Text-driven avatars based on artificial neural networks and fuzzy logic, International journal of computers, vol.4, issue.2, pp.61-69, 2010.

. Marsella, Virtual character performance from speech, Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA '13, pp.25-35, 2013.
DOI : 10.1145/2485895.2485900
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.353.1028

P. Massaro, D. W. Massaro, and S. E. Palmer, Perceiving talking faces: From speech perception to a behavioral principle, 1998.

. Masuko, Speech synthesis using HMMs with dynamic features, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, pp.389-392, 1996.
DOI : 10.1109/ICASSP.1996.541114

W. Mattheyses and W. Verhelst, Audiovisual speech synthesis: An overview of the state-of-the-art, Speech Communication, vol.66, pp.182-217, 2015.
DOI : 10.1016/j.specom.2014.11.001

M. Mcgurk, H. Mcgurk, and J. Macdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, 1976.
DOI : 10.1038/264746a0

M. Mori, The uncanny valley, pp.33-38, 1970.

. Mori, Emotional Speech Synthesis using Subspace Constraints in Prosody, 2006 IEEE International Conference on Multimedia and Expo, pp.1093-1096, 2006.
DOI : 10.1109/ICME.2006.262725
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.420.8899

. Morlec, Synthesis and evaluation of intonation with a superposition model, EUROSPEECH, pp.2043-2046, 1995.

. Morlec, Generating prosodic attitudes in French: Data, model and evaluation, Speech Communication, vol.33, issue.4, pp.357-371, 2001.
DOI : 10.1016/S0167-6393(00)00065-0

. Munhall, Visual Prosody and Speech Intelligibility: Head Movement Improves Auditory Speech Perception, Psychological Science, vol.11, issue.2, pp.133-137, 2004.
DOI : 10.1016/S0167-6393(98)00048-X
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.324.1159

A. Murray, I. R. Murray, and J. L. Arnott, Implementation and testing of a system for producing emotion-by-rule in synthetic speech, Speech Communication, vol.16, issue.4, pp.369-390, 1995.
DOI : 10.1016/0167-6393(95)00005-9

T. Nakano and S. Kitazawa, Eyeblink entrainment at breakpoints of speech, Experimental Brain Research, vol.30, issue.Pt 3, pp.577-581, 2010.
DOI : 10.1007/s00221-010-2387-z

. Niewiadomski, Greta: an interactive expressive eca system, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems- International Foundation for Autonomous Agents and Multiagent Systems, pp.1399-1400, 2009.

J. J. Ohala, Ethological theory and the expression of emotion in the voice, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.1812-1815, 1996.
DOI : 10.1109/ICSLP.1996.607982

. Ortony, Effective functioning: A three level model of affect, motivation, cognition, and behavior. Who needs emotions, pp.173-202, 2005.

J. Ostermann, Animation of synthetic faces in MPEG-4, Proceedings Computer Animation '98 (Cat. No.98EX169), pp.49-55, 1998.
DOI : 10.1109/CA.1998.681907

. Ouni, Acoustic-visual synthesis technique using bimodal unit-selection, Speech, and Music Processing, pp.20131-20144, 2013.
DOI : 10.1016/j.specom.2004.11.008
URL : https://hal.archives-ouvertes.fr/hal-00835854

. Oyekoya, Eyelid kinematics for virtual characters, Computer animation and virtual worlds, vol.21, pp.3-4161, 2010.

F. I. Parke, A parametric model for human faces, 1974.

. Pelachaud, Generating Facial Expressions for Speech, Cognitive Science, vol.24, issue.4, pp.1-46, 1996.
DOI : 10.1207/s15516709cog2001_1
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.7035

G. Perlin, K. Perlin, and A. Goldberg, Improv, Proceedings of the 23rd annual conference on Computer graphics and interactive techniques , SIGGRAPH '96, pp.205-216, 1996.
DOI : 10.1145/237170.237258

R. Ronfard, Notation et reconnaissance des actions scéniques par ordinateur, 2012.

. Roon, Coordination of eyebrow movement with speech acoustics and movement, 2015.

. Savran, Bosphorus Database for 3D Face Analysis, Biometrics and Identity Management, pp.47-56, 2008.
DOI : 10.1007/978-3-540-89991-4_6
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.6819

K. R. Scherer, Vocal affect expression: A review and a model for future research., Psychological Bulletin, vol.99, issue.2, p.143, 1986.
DOI : 10.1037/0033-2909.99.2.143

K. R. Scherer and H. Ellgring, Multimodal expression of emotion: Affect programs or componential appraisal patterns?, Emotion, vol.7, issue.1, p.158, 2007.
DOI : 10.1037/1528-3542.7.1.158

. Scherer, Perception Markup Language: Towards a Standardized Representation of Perceived Nonverbal Behaviors, Intelligent virtual agents, pp.455-463, 2012.
DOI : 10.1007/978-3-642-33197-8_47

. Scherer, Paralinguistio behaviour: Internal push or external pull? In Language: social psychological perspectives: selected papers from the first International Conference on Social Psychology and Language, p.279, 1979.

A. Schnitzler, Hands around. Privately Printed for Subscribers, 1920.

B. Schröder and S. Breuer, Xml representation languages as a way of interconnecting tts modules, INTERSPEECH, 2004.

. Schröder, What Should a Generic Emotion Markup Language Be Able to Represent?, Affective Computing and Intelligent Interaction, pp.440-451, 2007.
DOI : 10.1007/978-3-540-74889-2_39

M. Trouvain and J. , The german textto-speech synthesis system mary: A tool for research, development and teaching, International Journal of Speech Technology, vol.6, issue.4, pp.365-377, 2003.

. Sendra, Perceiving incredulity: The role of intonation and facial gestures, Journal of Pragmatics, vol.47, issue.1, pp.1-13, 2013.
DOI : 10.1016/j.pragma.2012.08.008

M. Srinivasan, R. J. Srinivasan, and D. W. Massaro, Perceiving Prosody from the Face and Voice: Distinguishing Statements from Echoic Questions in English, Language and Speech, vol.46, issue.1, pp.1-22, 2003.
DOI : 10.1177/00238309030460010201

. Stylianou, Statistical methods for voice quality transformation, 1995.

E. Krahmer, Visual prosody of newsreaders: Effects of information structure, emotional content and intended audience on facial expressions, Journal of Phonetics, vol.38, issue.2, pp.197-206, 2010.

. Tao, Prosody conversion from neutral speech to emotional speech. Audio, Speech, and Language Processing, IEEE Transactions on, vol.14, issue.4, pp.1145-1154, 2006.

B. Taylor, P. Taylor, and A. W. Black, Concept-to-speech synthesis by phonological structure matching, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.358, issue.1769, pp.623-626, 1999.
DOI : 10.1098/rsta.2000.0594
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.9996

. Taylor, Dynamic units of visual speech, Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation Eurographics Associa- tion, pp.275-284, 2012.

B. Theobald, Audiovisual speech synthesis, International Congress on Phonetic Sciences, pp.285-290, 2007.

. Thiebaux, Real-time expressive gaze animation for virtual humans, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems International Foundation for Autonomous Agents and Multiagent Systems, pp.321-328, 2009.

. Toda, Mapping from articulatory movements to vocal tract spectrum with gaussian mixture model for articulatory speech synthesis, Fifth ISCA Workshop on Speech Synthesis, 2004.
DOI : 10.1016/j.specom.2007.09.001
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.157.6833

. Toda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. Audio, Speech, and Language Processing, IEEE Transactions, issue.8, pp.152222-2235, 2007.

. Vandeventer, ): A 4d database of natural, dyadic conversations, International Conference on Auditory-Visual Speech Processing, 2015.

R. Veaux, C. Veaux, and X. Rodet, Intonation conversion from neutral to expressive speech, INTERSPEECH, pp.2765-2768, 2011.

. Vertegaal, Eye gaze patterns in conversations, Proceedings of the SIGCHI conference on Human factors in computing systems , CHI '01, pp.301-308, 2001.
DOI : 10.1145/365024.365119

. Vilhjálmsson, The Behavior Markup Language: Recent Developments and Challenges, Intelligent virtual agents, pp.99-111, 2007.
DOI : 10.1007/978-3-540-74997-4_10

H. H. Vilhjálmsson, Animating Conversation in Online Games, Entertainment Computing?ICEC 2004, pp.139-150, 2004.
DOI : 10.1007/978-3-540-39396-2_15

. Vlasic, Face transfer with multilinear models, ACM Transactions on Graphics, vol.24, issue.3, pp.426-433, 2005.
DOI : 10.1145/1073204.1073209
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.9158

. Wang, Synthesizing photo-real talking head via trajectory-guided sample selection, INTERSPEECH, pp.446-449, 2010.

K. Waters, A muscle model for animation three-dimensional facial expression, ACM SIGGRAPH Computer Graphics, vol.21, issue.4, pp.17-24, 1987.
DOI : 10.1145/37402.37405

. Weise, Realtime performance-based facial animation, Proceedings SIGGRAPH 2011), pp.771-7710, 2011.
DOI : 10.1145/1964921.1964972
URL : http://cgit.nutn.edu.tw:8080/cgit/PaperDL/LZJ_111124083035.PDF

. Wu, Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis, Audio, Speech, and Language Processing, pp.1394-1405, 2010.

Y. Xu, Speech prosody: A methodological review, Journal of Speech Sciences, vol.1, issue.1, pp.85-115, 2011.

. Zeng, A survey of affect recognition methods, Proceedings of the ninth international conference on Multimodal interfaces , ICMI '07, pp.3139-58, 2009.
DOI : 10.1145/1322192.1322216