Skip to Main content Skip to Navigation
Theses

Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems

Abstract : Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01797231
Contributor : Abes Star :  Contact
Submitted on : Tuesday, May 22, 2018 - 2:29:06 PM
Last modification on : Tuesday, March 31, 2020 - 3:23:20 PM
Document(s) archivé(s) le : Monday, September 24, 2018 - 11:29:18 AM

File

2017LEMA1040.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01797231, version 1

Citation

Natalia Tomashenko. Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems. Neural and Evolutionary Computing [cs.NE]. Université du Maine; ITMO University, 2017. English. ⟨NNT : 2017LEMA1040⟩. ⟨tel-01797231⟩

Share

Metrics

Record views

313

Files downloads

1627