Abstract : The work performed during this thesis concerns visual speech synthesis in the context of humanoid animation. Our study proposes and implements control models for facial animation that generate articulatory trajectories from text. We have used 2 audiovisual corpuses in our work. First of all, we compared objectively and subjectively the main state-of-the-art models. Then, we studied the spatial aspect of the articulatory targets generated by HMM-based synthesis and concatenation-based synthesis that combines the advantages of these methods. We have proposed a new synthesis model named TDA (Task Dynamics for Animation). The TDA system plans the geometric targets by HMM synthesis and executes the computed targets by concatenation of articulatory segments. Then, we have studied the temporal aspect of the speech synthesis and we have proposed a model named PHMM (Phased Hidden Markov Model). The PHMM manages the temporal relations between different modalities related to speech. This model calculates articulatory gestures boundaries as a function of the corresponding acoustic boundaries between allophons. It has been also applied to the automatic synthesis of Cued speech in French. Finally, a subjective evaluation of the different proposed systems (concatenation, HMM, PHMM and TDA) is presented.