Skip to Main content Skip to Navigation

Deep neural networks for source separation and noise-robust speech recognition

Aditya Arie Nugraha 1 
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact EM formulation. On a singing-voice separation task, our systems perform remarkably close to the current state-of-the-art method and provide up to 2 dB improvement of the source-to-interference ratio. On a speech enhancement task, our systems outperforms the state-of-the-art GEV-BAN beamformer by 14%, 7%, and 1% relative word error rate improvement on 6-channel, 4-channel, and 2-channel data, respectively
Document type :
Complete list of metadata

Cited literature [249 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Monday, January 15, 2018 - 4:56:28 PM
Last modification on : Thursday, April 14, 2022 - 3:36:04 PM
Long-term archiving on: : Sunday, May 6, 2018 - 10:33:27 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01684685, version 1


Aditya Arie Nugraha. Deep neural networks for source separation and noise-robust speech recognition. Signal and Image Processing. Université de Lorraine, 2017. English. ⟨NNT : 2017LORR0212⟩. ⟨tel-01684685⟩



Record views


Files downloads