Deep neural networks for source separation and noise-robust speech recognition

Aditya Arie Nugraha 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact EM formulation. On a singing-voice separation task, our systems perform remarkably close to the current state-of-the-art method and provide up to 2 dB improvement of the source-to-interference ratio. On a speech enhancement task, our systems outperforms the state-of-the-art GEV-BAN beamformer by 14%, 7%, and 1% relative word error rate improvement on 6-channel, 4-channel, and 2-channel data, respectively
Document type :
Theses
Complete list of metadatas

Cited literature [249 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01684685
Contributor : Abes Star <>
Submitted on : Monday, January 15, 2018 - 4:56:28 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on: Sunday, May 6, 2018 - 10:33:27 AM

File

DDOC_T_2017_0212_ADITYA_ARIE_N...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01684685, version 1

Citation

Aditya Arie Nugraha. Deep neural networks for source separation and noise-robust speech recognition. Signal and Image Processing. Université de Lorraine, 2017. English. ⟨NNT : 2017LORR0212⟩. ⟨tel-01684685⟩

Share

Metrics

Record views

717

Files downloads

3778