J. Oublié-ton-message, Tiens-toi assis ! 200

. Sur-une-vidéo-enregistrée, 67. devant le pays un peu surpris. 68. la capitale tchétchène. 69. est mortè a l'? age de 55 ans

L. Faim-oeuvre-sur-le-coup-de-midi, 88. en tant que pays hôte

. Veut-acheter-un-meuble-en-hêtre-avec-un-heurtoir, 123. o` u il engloutit une huitre sans coup férir

L. 'un-d, entre eux invite un couple de retraités. 137. ` a l'université d'Aix-Marseille III

. Raille-un-programme-européen-indécis, 190. rassemble sur une page Web. 191. Lorsqu'il eut fini. 192. il faut bien autre chose que la. 193. mais la situation outre-Rhin. 194

F. Le and . Qu, elle ait ou non des enfants. 210. un commerçant coréen en pleurs. 211. le moteur devenait instable. 212. n'ai-je pasétépasété informé. 213. l'homme exhibe un convertisseur

L. 'un-d, entre eux en tout cas. 217. dont il est hanté et qui le hante

. Dix-ans-ont-passé, En version windows. 271. les orgues gloutons de Gainsbourg. 272. Soudain sa voix enfle. 273. dont un néon avec les lettres Action

. Harponnera-t-elle-parmi-eux-un-futur-mari, 283. qui avoisine 8000 tonnes. 284. par le consortium européen Eurodif. 285. c'est un zingueur qu'il nous faut. 286. ou des razzias en horde. 287. propose huit histoires de paternité

C. Abry and L. Boe, ???Laws??? for lips, Speech Communication, vol.5, issue.1, pp.97-104, 1986.
DOI : 10.1016/0167-6393(86)90032-4

C. Abry, J. Orliaguet, and R. Sock, Patterns of speech phasing. Their robustness in the production of a timed linguistic task : single versus double (abutted) consonants in French, European Bulletin of Cognitive Psychology, vol.10, pp.263-288, 1990.

L. M. Arslan and D. Talkin, 3-D Face Point Trajectory Synthesis using an Automatically derived Visual Phoneme Similarity matrix, 1998.

V. Attina, La Langue française Parlée Complété (LPC) : production et perception, 2005.

P. Badin, G. Bailly, L. Revèret, M. Baciu, C. Segebarth et al., Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images, Journal of Phonetics, vol.30, issue.3, pp.533-553, 2002.
DOI : 10.1006/jpho.2002.0166
URL : https://hal.archives-ouvertes.fr/hal-00798689

G. Bailly, Audiovisual speech synthesis. Pages 1?10 of, ETRW on Speech Synthesis, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00169556

G. Bailly, G. Gibert, and M. Odisio, Evaluation of movement generation systems using the point-light technique, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002., 2002.
DOI : 10.1109/WSS.2002.1224365

G. Bailly, M. Bérar, F. Elisei, and M. Odisio, Audiovisual speech synthesis, International Journal of Speech Technology, vol.6, issue.4, pp.331-346, 2003.
DOI : 10.1023/A:1025700715107
URL : https://hal.archives-ouvertes.fr/hal-00169556

G. Bailly, F. Elisei, P. Badin, and C. Savariaux, Degrees of freedom of facial movements in face-to-face conversational speech, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00195551

G. Bailly, O. Govokhina, and G. Breton, Multimodal control of talking heads, The Journal of the Acoustical Society of America, vol.123, issue.5, 2008.
DOI : 10.1121/1.2936014

G. Bailly, O. Govokhina, G. Breton, F. Elisei, and C. Savariaux, The trainable trajectory formation model TD-HMM parameterized for the LIPS 2008 challenge, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00339043

S. Basu, N. Oliver, and A. Pentland, 3D lip shapes from video: A combined physical???statistical model, Speech Communication, vol.26, issue.1-2, pp.131-148, 1998.
DOI : 10.1016/S0167-6393(98)00055-7

D. Beautemps, L. Girin, N. Aboutabit, G. Bailly, L. Besacier et al., TELMA : Téléphoniè a l'usage des malentendants, Des modèles aux tests d'usage. In : Conférence Internationale sur l'Accessibilité et les systèmes de suppléance aux personnes en situation de Handicaps (ASSISTH), 2007.

T. Beier and S. Neely, Feature-based image metamorphosis, Computer graphics, pp.35-42, 1992.
DOI : 10.1145/133994.134003

F. Berthommier, Direct Synthesis of Video from Speech Sounds for New Telecommunication Applications, 2003.

J. Beskow, Rule-based Visual Speech Synthesis, Pages 299?302 of : Proceedings of Eurospeech '95, 1995.

J. Beskow, Talking Heads -Models and Applications for Multimodal Speech Synthesis, 2003.

J. Beskow, Trainable Articulatory Control Models for Visual Speech Synthesis, International Journal of Speech Technology, vol.7, issue.4, pp.335-349, 2004.
DOI : 10.1023/B:IJST.0000037076.86366.8d

R. Bowden, Learning non-linear Models of Shape and Motion, 2000.

C. Bregler, Video Rewrite, Proceedings of the 24th annual conference on Computer graphics and interactive techniques , SIGGRAPH '97, 1997.
DOI : 10.1145/258734.258880

G. Breton, C. Bouville, and D. Pelé, FaceEngine a 3D facial animation engine for real time applications, Proceedings of the sixth international conference on 3D Web technology , Web3D '01, pp.15-22, 2001.
DOI : 10.1145/363361.363367

N. Brooke and S. D. Scott, Two and Three-Dimensional Audio-Visual Speech Synthesis, 1998.

C. P. Browman and L. Goldstein, Gestural specification using dynamically-defined articulatory structures, Journal of Phonetics, vol.18, pp.299-320, 1990.
DOI : 10.1121/1.407736

C. P. Browman and L. M. Goldstein, Gestural specification using dinamically-defined articulatory structures, Journal of Phonetics, vol.18, issue.3, pp.299-320, 1990.

. Calliope, La parole et son traitement automatique, 1989.

N. Campbell, CHATR : A High-Definition Speech Re-Sequencing System, 1995.

N. Campbell and S. D. Isard, Segment durations in a syllable frame, Journal of Phonetics, vol.19, pp.37-47, 1991.

A. Caplier, S. Stillittano, O. Aran, L. Akarun, G. Bailly et al., Image and video for hearing impaired people, EURASIP Journal on Image and Video Processing, p.14, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00275029

Y. Chang and T. Ezzat, Transferable videorealistic speech animation. Pages 143 ? 151 of, Eurographics Symposium on Computer Animation, 2005.
DOI : 10.1145/1073368.1073388
URL : http://cerboli.mit.edu:8000/publications/SCA05.pdf

M. M. Cohen and D. W. Massaro, Modeling coarticulation in synthetic visual speech. Pages 139?156 of, Models and Techniques in Computer Animation, 1993.

M. M. Cohen, D. W. Massaro, and R. Clark, Training a talking head, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces, 2002.
DOI : 10.1109/ICMI.2002.1167046

T. F. Cootes, G. J. Edwards, and C. J. Taylor, Active appearance models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.6, pp.681-685, 2001.
DOI : 10.1109/34.927467

R. O. Cornett, Cued Speech, manual complement to lipreading, for visual reception of spoken language. Principles, practice and prospects for automation, Acta Oto-Rhino- Laryngologica Belgica, vol.42, issue.3, pp.375-384, 1988.

E. Cosatto and H. P. Graf, Photo-realistic talking-heads from image samples. Pages 152?163 of, Trans. on Multimedia, vol.2, 2000.

P. Cosi, E. Caldognetto, . Magno, G. Perin, and C. Zmarich, Labial coarticulation modeling for realistic facial animation, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces, 2002.
DOI : 10.1109/ICMI.2002.1167047

D. Cosker, D. Marshall, P. Rosin, and Y. Hicks, Video realistic talking heads using hierarchical non-linear speech-appearance models, 2003.

B. Couteau, Y. Payan, and S. Lavallée, The mesh-matching algorithm: an automatic 3D mesh generator for finite element structures, Journal of Biomechanics, vol.33, issue.8, pp.1005-1009, 2000.
DOI : 10.1016/S0021-9290(00)00055-5
URL : https://hal.archives-ouvertes.fr/hal-00082218

Z. Deng, J. P. Lewis, and U. Neumann, Synthesizing speech animation by learning compact speech co-articulation models, CGI '05 : Proceedings of the Computer Graphics International 2005, pp.19-25, 2005.

N. F. Dixon and L. Spitz, The Detection of Auditory Visual Desynchrony, Perception, vol.264, issue.5, pp.719-721, 1980.
DOI : 10.1068/p090719

R. Donovan, Trainable Speech Synthesis, 1996.

P. Ekman and W. Friesen, Facial Action Coding System (FACS) : A technique for the measurement of facial action, 1978.

F. Elisei, M. Odisio, G. Bailly, and P. Badin, Creating and controlling video-realistic talking heads, 2001.

O. Engwall, Are statistical MRI data representative of dynamic speech ? Results from a comparative study using MRI, EMA and EPG, Pages 17?20 of : International Conference on Speech and Language Processing, 2000.

O. Engwall, Evaluation of a system for concatenative articulatory visual speech synthesis. Pages 665?668 of : Proc of ICSLP, 2002.

E. J. Eriksson, K. P. Sullivan, and P. E. Czigler, The importance of anticipatory coarticulation in the perception of round in Swedish front vowels : an investigation comparing natural speech with diphone synthesis, 2002.

T. Ezzat and T. Poggio, MikeTalk: a talking facial display based on morphing visemes, Proceedings Computer Animation '98 (Cat. No.98EX169), 1998.
DOI : 10.1109/CA.1998.681913

T. Ezzat, G. Geiger, and T. Poggio, MARY101 : A trainable videorealistic speech animation system. Pages 57, Audiovisual Speech Processing, 2002.

S. Fagel, Joint Audio-Visual Unit Selection -The JAVUS Speech Synthesizer, International Conference on Speech and Computer, 2006.

S. Fagel and C. Clemens, An articulation model for audiovisual speech synthesis???Determination, adjustment, evaluation, Speech Communication, vol.44, issue.1-4, pp.141-154, 2004.
DOI : 10.1016/j.specom.2004.10.006

G. Geiger, T. Ezzat, and T. Poggio, Perceptual evaluation of videorealistic speech, 2003.

G. Gibert, Conception et Evaluation d'un système de synthèse 3D de Langue française Parlée Complétée (LPC) ` a partir du texte, 2006.

F. Girosi, M. Jones, and T. Poggio, Regularization Theory and Neural Networks Architectures, Neural Computation, vol.26, issue.3, pp.219-269, 1995.
DOI : 10.1016/0893-6080(90)90004-5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9258

O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, Evaluation de systèmes de génération de mouvements faciaux, 2006.

O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, A new trainable trajectory formation system for facial animation, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00366489

O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, TDA : A new trainable trajectory formation system for facial animation, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00366489

O. Govokhina, G. Bailly, and G. Breton, Learning optimal audiovisual phasing for a HMM-based control model for facial animation, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00169576

J. Groleau, M. Chabanas, C. Marecaux, N. Payrard, B. Segaud et al., A biomechanical model of the face including muscles for the prediction of deformations during speech production, Proceedings of the 5th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00267458

A. Hallgren and B. Lyberg, Visual speech synthesis with concatenative speech, 1998.

T. J. Hazen, Visual model structures and synchrony constraints for audio-visual speech recognition, Pages 1082?1089 of : IEEE Transactions on Audio, Speech and Language Pro- cessing, 2006.
DOI : 10.1109/TSA.2005.857572

S. Hiroya and M. Honda, Estimation of Articulatory Movements From Speech Acoustics Using an HMM-Based Speech Production Model, Pages 175?185 of : IEEE Transactions on Speech and Audio Processing, 2004.
DOI : 10.1109/TSA.2003.822636

P. Hong, Z. Wen, and T. S. Huang, Real-Time Speech-Driven Face Animation. Pages 115?124 of : MPEG-4 Facial Animation, 2002.
DOI : 10.1002/0470854626.ch7

J. Hardcastle, W. Hewlett, and N. , Coarticulation : Theory, Data, and Techniques, 1999.
DOI : 10.1017/CBO9780511486395

P. Kakumanu, Analysis and evaluation of factors affecting speech driven facial animation, 2003.

P. Kakumanu, R. Gutierrez-osuna, A. Esposito, and O. N. Garcia, Comparing Different Acoustic Data-Encoding for Speech-Driven Facial Animation, Speech Communication, pp.598-615, 2002.

H. Klaus, H. Klix, J. Sotscheck, and K. Fellbaumn, An evaluation system for ascertaining the quality of synthetic speech based on subjective category rating tests, Pages 1679?1682 of : Proceedings of the Third European Conference on Speech Communication and Technology, 1993.

V. Kozhevnikov and L. Chistovich, Speech : Articulation and Perception. Joint Publications Research Service, pp.1779-1793, 1965.

A. Lanitis, C. J. Taylor, and T. F. Cootes, A unified approach to coding and interpreting face images, Proceedings of IEEE International Conference on Computer Vision, 1995.
DOI : 10.1109/ICCV.1995.466919

L. Goff, B. Guiard-marigny, T. Cohen, M. Beno??tbeno??t, and C. , Real-Time Analysis-Synthesis and Intelligibility of Talking Faces, Pages 53?56 of : Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, 1994.

Y. Lee, D. Terzopoulos, and K. Waters, Realistic modeling for facial animation, Proceedings of the 22nd annual conference on Computer graphics and interactive techniques , SIGGRAPH '95, 1995.
DOI : 10.1145/218380.218407

B. Legoff and C. Benoit, A text-to-audiovisual-speech synthesizer for french, Pages 2163? 2166 of : International Conference on Spoken Language Processing (ICSLP), 1996.

A. Lofqvist, Speech as audible gestures. Speech Production and Speech Modeling, pp.289-322, 1990.

J. C. Lucero, K. G. Munhall, E. Vatikiotis-bateson, V. L. Gracco, and D. Terzopoulos, Muscle???based modeling of facial dynamics during speech, The Journal of the Acoustical Society of America, vol.101, issue.5, pp.3175-3176, 1997.
DOI : 10.1121/1.419316

B. Mak and E. Banard, Phone clustering using the Bhattacharyya distance, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607191

M. Marschark, D. Lepoutre, and L. Bement, Mouth movement and signed communication, 1998.

D. W. Massaro, Perceiving Talking Faces : From Speech Perception to a Behavioral Principle, 1998.

D. W. Massaro and D. G. Stork, Speech Recognition and Sensory Integration, American Scientist, vol.86, issue.3, pp.236-244, 1998.
DOI : 10.1511/1998.25.861

D. W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry, and T. Rodriquez, Picture My Voice : Audio to Visual Speech Synthesis using Artificial Neural Networks, Pages 133?138 of : Proceedings from AVSP'99, 1999.

H. Mcgurk and J. Macdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, pp.746-748, 1976.
DOI : 10.1038/264746a0

S. Minnis and A. P. Breen, Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis, Pages 759?762 of : International Conference on Speech and Language Processing, 1998.

M. A. Nazari, Y. Payan, P. Perrier, M. Chabanas, and C. Lobos, A continuous biomechanical model of the face : a study of muscles coordinations for speech lip gestures, Proc. of ISSP, 2008.

T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, A Style Control Technique for HMM-Based Expressive Speech Synthesis, IEICE Transactions on Information and Systems, vol.90, issue.9, pp.90-1406, 2007.
DOI : 10.1093/ietisy/e90-d.9.1406

M. Odisio and G. Bailly, Audiovisual perceptual evaluation of resynthesised speech movements, Pages 2029?2032 of : Proceedings of the International Conference on Spoken Language Processing, 2004.

M. Odisio, G. Bailly, and F. Elesei, Tracking talking faces with shape and appearance models, Speech Communication, vol.44, issue.1-4, pp.63-82, 2004.
DOI : 10.1016/j.specom.2004.10.008

S. E. Ohman, Numerical model of coarticulation, Journal of the Acoustical Society of America, pp.310-320, 1967.

T. Ohman, An audio-visual speech database and automatic measurements of visual speech, 1998.

T. Okadome, T. Kaburagi, and M. Honda, Articulatory movement formation by kinematic triphone model, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 1999.
DOI : 10.1109/ICSMC.1999.825306

T. Okadome, S. Suzuki, and M. Honda, Recovery of articulatory movements from acoustics with phonemic information, Pages 229?232 of : Proceedings of the 5th Seminar on Speech Production, 2000.

J. Olives, R. Möttönen, J. Kulju, and M. Sams, Audio-Visual Speech Synthesis for Finnish, 1999.

I. Pandzic, J. Ostermann, and D. Millen, Users evaluation : synthetic talking faces for interactive services. The Visual Computer, pp.330-340, 1999.

F. I. Parke, A parametric model for human faces, 1974.

F. I. Parke, Parameterized Models for Facial Animation, IEEE Computer Graphics and Applications, vol.2, issue.9, pp.61-70, 1982.
DOI : 10.1109/MCG.1982.1674492

C. Pelachaud, N. Badler, and M. Viaud, Generating Facial Expressions for Speech, Cognitive Science, vol.24, issue.4, pp.1-46, 1996.
DOI : 10.1207/s15516709cog2001_1
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.7035

J. S. Perkell and C. Chiang, Preliminary support for a " hybrid model " of anticipatory coarticulation, 1986.

F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin, Synthesizing Realistic Facial Expressions from Photographs, 1998.

M. Pitermann, Chaos dans la modélisation des tissus mous, 2004.

S. M. Platt and N. I. Badler, Animating facial expressions, ACM SIGGRAPH Computer Graphics, vol.15, issue.3, pp.245-252, 1981.
DOI : 10.1145/965161.806812

G. Potamianos, C. Neti, J. Luettin, and I. Matthews, Audiovisual automatic speech recognition, 2004.
DOI : 10.1017/CBO9780511843891.011

L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition. Pages 267?296 of, Readings in Speech Recognition, 1989.

L. Revéret, G. Bailly, and P. Badin, MOTHER : a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation, Pages 755?758 of : International Conference on Speech and Language Processing, 2000.

E. L. Saltzman and K. G. Munhall, A Dynamical Approach to Gestural Patterning in Speech Production, Ecological Psychology, vol.16, issue.4, pp.1615-1623, 1989.
DOI : 10.1121/1.387227

M. Schroeder, Determination of the Geometry of the Human Vocal Tract by Acoustic Measurements, The Journal of the Acoustical Society of America, vol.41, issue.4B, pp.1002-1010, 1967.
DOI : 10.1121/1.1910429

K. C. Scott, D. S. Kagels, S. H. Watson, H. Rom, J. R. Wright et al., Synthesis of speaker facial movement to match selected speech sequences, 1994.

M. Stone, A three???dimensional model of tongue movement based on ultrasound and x???ray microbeam data, The Journal of the Acoustical Society of America, vol.87, issue.5, pp.2207-2217, 1990.
DOI : 10.1121/1.399188

W. H. Sumby and I. Pollack, Visual Contribution to Speech Intelligibility in Noise, The Journal of the Acoustical Society of America, vol.26, issue.2, pp.212-215, 1954.
DOI : 10.1121/1.1907309

Q. Summerfield, Some preliminaries to a comprehensive account of audio-visual speech perception. Pages 3?51 of Hearing by eye : the pyschology of lipreading, 1987.

M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing, IEICE Transactions on Information and Systems, vol.88, issue.11, pp.88-2484, 2005.
DOI : 10.1093/ietisy/e88-d.11.2484

M. Tamura, T. Masuko, T. Kobayashi, and K. Tokuda, Visual speech synthesis based on parameter generation from HMM : speech-driven and text-and-speech-driven approaches, 1998.

M. Tamura, S. Kondo, T. Masuko, and T. Kobayashi, Text-to-audiovisual speech synthesis based on parameter generation from HMM, Pages 959?962 of : European Conference on Speech Communication and Technology, 1999.

P. Taylor and A. W. Black, Concept-to-speech synthesis by phonological structure matching, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.358, issue.1769, 1999.
DOI : 10.1098/rsta.2000.0594

D. Terzopoulos and K. Waters, Physically-based facial modelling, analysis, and animation, The Journal of Visualization and Computer Animation, vol.12, issue.Washington, DC, pp.73-80, 1990.
DOI : 10.1002/vis.4340010208

B. Theobald, J. A. Bangham, I. Matthews, and G. Cawley, Evaluation of a talking head based on appearance models, 2003.

T. Toda and K. Tokuda, A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis, IEICE Transactions on Information and Systems, vol.90, issue.5, pp.816-824, 2007.
DOI : 10.1093/ietisy/e90-d.5.816

T. Toda, A. W. Black, and K. Tokuda, Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model, Speech Communication, vol.50, issue.3, pp.215-227, 2008.
DOI : 10.1016/j.specom.2007.09.001

K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000.
DOI : 10.1109/ICASSP.2000.861820

J. P. Van-santen, Segmental duration and speech timing. Pages 225?249 of Computing prosody : Computational models for processing spontaneous speech, 1997.

K. Waters, A muscle model for animation three-dimensional facial expression, ACM SIGGRAPH Computer Graphics, vol.21, issue.4, pp.17-24, 1987.
DOI : 10.1145/37402.37405

C. Weiss, FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis, 2005.

D. H. Whalen, Coarticulation is largely planned, Journal of Phonetics, vol.18, issue.1, pp.3-35, 1990.

P. C. Woodland, J. J. Odell, V. Valtchev, and S. J. Young, Large vocabulary continuous speech recognition using HTK, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994.
DOI : 10.1109/ICASSP.1994.389562

E. Yamamoto, . Nakamura, . Satoshi, . Shikano, and . Kiyohiro, Subjective evaluation for HMMbased speech-to-lip movement synthesis, p.1998, 1998.

H. C. Yehia, P. E. Rubin, and E. Vatikiotis-bateson, Quantitative association of vocal-tract and facial behavior, Speech Communication, vol.26, issue.1-2, pp.23-43, 1998.
DOI : 10.1016/S0167-6393(98)00048-X

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Duration modeling for HMM-based speech synthesis, 1998.

H. Zen, K. Tokuda, and T. Kitamura, An introduction of trajectory model into HMM-based speech synthesis, 2004.