Modèles de mélange pour la séparation multicanale de sources sonores en milieu réverbérant

Abstract : In this thesis we address the problem of audio source separation for multichannel mixtures recorded in a reverberant environment. Our work focuses on the under-determined case, that is, when the number of sources to be separated is greater than the number of channels in the mixture. In order to tackle such a problem, it is often useful to develop a parametric model that explains the observed data. In this thesis we adopt a probabilistic and hierarchical approach in which the modeling of the monophonic source signals is distinguished from that of the mixing process. The sources are characterized in a time-frequency domain in order to obtain a sparse representation, suitable for the development of a model because highlighting the specific structure of audio signals and particularly musical ones. We rely on a probabilistic modeling of the sources where their time-frequency coefficients are represented as latent random variables. Defining the source model then amounts to defining the prior joint distribution of these coefficients. The source models in this thesis are mainly based on the Gaussian and the Student’s t distributions. We will also use non-negative matrix factorization approaches. One advantage of this rank reduction technique is that the number of parameters to be estimated is reduced. The main contributions of this thesis concern the modeling of the mixture in the presence of reverberation. Such a mixture is naturally represented in the time domain by the convolution of the source signals with the room impulse responses which characterize the acoustic path between each source and each microphone. These responses are called mixing filters in the context of source separation. The latter are generally treated in the literature as deterministic parameters, that are only estimated from the observed data. It is known, however, that they correspond to room responses, so they have a very specific structure that could be used to guide their estimation. In a first part we consider a common approximation in the literature, which consists in approaching the temporal convolution by a simple multiplication in the short-time Fourier transform domain, under the hypothesis that the impulse response of the mixing filters is short. The mixture is then characterized by the frequency response of the filters. Based on geometrical room acoustics concepts, we model the direct path and the first echoes of the room response by an autoregressive process in the frequency domain. According to statistical room acoustics results, late reverberation is modeled as a Gaussian random process also in the frequency domain. We exploit the exponential temporal decay of late reverberation to obtain theoretical expressions of the autocovariance function and power spectral density of this process. We also propose an autoregressive moving average parametrization of these two quantities. Finally, we develop a source separation method based on an expectation-maximization algorithm which exploits priors on the mixing filters in order to perform maximum a posteriori estimation. In a second part, we wish to relax the short mixing filters assumption because it fundamentally limits the separation performance for highly reverberant mixtures. We propose to infer the time-frequency source coefficients from the time-domain mixture observations, using a variational method. This approach makes it possible to exactly represent the convolutive mixing process, in the time domain. Preliminary results obtained by assuming that the mixing filters are known show the robustness of this approach in the presence of high reverberation. We then develop a room impulse response model based on the Student’s t distribution. This distribution allows us to take into account the direct path and the first echoes which, from a statistical point of view, correspond to outliers with respect to the Gaussian reverberation model with exponentially decaying amplitude. Finally, we develop a source separation method based on a variational inference technique where the mixing filters are considered as latent random variables in the time domain. We also show that this approach allows us to adapt the time-frequency representation to each individual source in the mixture, especially in terms of resolution.
Complete list of metadatas

Cited literature [172 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01721933
Contributor : Simon Leglaive <>
Submitted on : Thursday, October 25, 2018 - 5:40:30 PM
Last modification on : Thursday, October 17, 2019 - 12:36:55 PM
Long-term archiving on : Saturday, January 26, 2019 - 4:05:03 PM

File

these_Simon_Leglaive.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01721933, version 2

Collections

Citation

Simon Leglaive. Modèles de mélange pour la séparation multicanale de sources sonores en milieu réverbérant. Traitement du signal et de l'image [eess.SP]. Télécom ParisTech, 2017. Français. ⟨tel-01721933v2⟩

Share

Metrics

Record views

73

Files downloads

421