Skip to Main content Skip to Navigation
Theses

La bimodalité de la parole au secours de la séparation de sources

Abstract : This thesis is dedicated to both the joint modeling of the audio and visual modalities of speech and its use in source separation. A mixture of kernels is first proposed to model the bi-modality of audiovisual speech. This modeling is then exploited to detect the silence phases of speech. Moreover, we propose a purely visual detection of silence based on the lip movements of the speaker. The later detection is robust to any acoustic environment. These two modelings are then exploited in source separation of convolutive mixtures. We first solve the classical indeterminacies encountered by frequency domain separation algorithms. We then propose a geometric separation which exploits the silence of the source of interest. The proposed algorithms are validated by experiments on multi-speakers and multi-languages databases.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00200871
Contributor : Bertrand Rivet <>
Submitted on : Friday, December 21, 2007 - 4:57:39 PM
Last modification on : Friday, November 6, 2020 - 4:36:17 AM
Long-term archiving on: : Tuesday, April 13, 2010 - 3:02:31 PM

Identifiers

  • HAL Id : tel-00200871, version 1

Collections

Citation

Bertrand Rivet. La bimodalité de la parole au secours de la séparation de sources. Traitement du signal et de l'image [eess.SP]. Institut National Polytechnique de Grenoble - INPG, 2006. Français. ⟨tel-00200871⟩

Share

Metrics

Record views

752

Files downloads

249