Modeling of Convolutive Audio Mixtures Applied to Source Separation

Ngoc Duong 1
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We consider the task of under-determined and determined reverberant audio source separation, that is the extraction of the signal of each source from a multichannel audio mixture. We propose a general Gaussian modeling framework whereby the contribution of each source to all mixture channels in the time-frequency domain is modeled as a zero-mean Gaussian random variable whose covariance encodes both the spatial and the spectral characteristics of the source. In order to better account for the reverberant mixing process, we relax the conventional narrowband assumption resulting in rank-1 spatial covariance and compute the upper bound on the separation performance achievable with full-rank spatial covariance. Experimental results indicate an improvement of up to 6 dB Signal-to-Distortion Ratio (SDR) in moderate to high reverberant conditions which supports this generalization. We also consider the use of quadratic time-frequency representations and that of the auditory-motivated equivalent rectangular bandwidth (ERB) frequency scale to increase the amount of exploitable information and decrease the overlap between the sources in the input representation. After this theoretical validation of the proposed framework, we focus on estimating the model parameters from a given mixture signal in a practical blind source separation scenario. We derive a family of Expectation-Maximization (EM) algorithms to estimate the parameters either in the maximum likelihood (ML) sense or in the maximum a posteriori (MAP) sense. We propose a family of spatial location priors inspired by the theory of room acoustics as well as a spatial continuity prior and investigate the use of two spectral priors previously used in a single-channel or rank-1 multichannel context, namely spectral continuity and Nonnegative Matrix Factorization (NMF). The source separation results given by the proposed approach are compared with several baseline and state-of-the-art algorithms on both simulated mixtures and real-world recordings in various scenarios.
Complete list of metadatas

Cited literature [133 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00667117
Contributor : Ngoc Duong <>
Submitted on : Monday, February 6, 2012 - 11:36:34 PM
Last modification on : Friday, November 16, 2018 - 1:23:20 AM
Long-term archiving on : Monday, May 7, 2012 - 2:30:13 AM

Identifiers

  • HAL Id : tel-00667117, version 1

Citation

Ngoc Duong. Modeling of Convolutive Audio Mixtures Applied to Source Separation. Signal and Image Processing. Université Rennes 1, 2011. English. ⟨tel-00667117⟩

Share

Metrics

Record views

766

Files downloads

828