Skip to Main content Skip to Navigation

La bimodalité de la parole au secours de la séparation de sources

Abstract : This thesis is dedicated to both the joint modeling of the audio and visual modalities of speech and its use in source separation. A mixture of kernels is first proposed to model the bi-modality of audiovisual speech. This modeling is then exploited to detect the silence phases of speech. Moreover, we propose a purely visual detection of silence based on the lip movements of the speaker. The later detection is robust to any acoustic environment. These two modelings are then exploited in source separation of convolutive mixtures. We first solve the classical indeterminacies encountered by frequency domain separation algorithms. We then propose a geometric separation which exploits the silence of the source of interest. The proposed algorithms are validated by experiments on multi-speakers and multi-languages databases.
Complete list of metadatas
Contributor : Bertrand Rivet <>
Submitted on : Friday, December 21, 2007 - 4:57:39 PM
Last modification on : Friday, November 6, 2020 - 4:36:17 AM
Long-term archiving on: : Tuesday, April 13, 2010 - 3:02:31 PM


  • HAL Id : tel-00200871, version 1



Bertrand Rivet. La bimodalité de la parole au secours de la séparation de sources. Traitement du signal et de l'image [eess.SP]. Institut National Polytechnique de Grenoble - INPG, 2006. Français. ⟨tel-00200871⟩



Record views


Files downloads