Reconnaissance automatique du locuteur par des GMM à grande marge

Reda Jourani 1
1 SAMoVA - Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio
IRIT - Institut de recherche en informatique de Toulouse
Abstract : Most of state-of-the-art speaker recognition systems are based on Gaussian Mixture Models (GMM), trained using maximum likelihood estimation and maximum a posteriori (MAP) estimation. The generative training of the GMM does not however directly optimize the classification performance. For this reason, discriminative models, e.g., Support Vector Machines (SVM), have been an interesting alternative since they address directly the classification problem, and they lead to good performances. Recently a new discriminative approach for multiway classification has been proposed, the Large Margin Gaussian mixture models (LM-GMM). As in SVM, the parameters of LM-GMM are trained by solving a convex optimization problem. However they differ from SVM by using ellipsoids to model the classes directly in the input space, instead of half-spaces in an extended high-dimensional space. While LM-GMM have been used in speech recognition, they have not been used in speaker recognition (to the best of our knowledge). In this thesis, we propose simplified, fast and more efficient versions of LM-GMM which exploit the properties and characteristics of speaker recognition applications and systems, the LM-dGMM models. In our LM-dGMM modeling, each class is initially modeled by a GMM trained by MAP adaptation of a Universal Background Model (UBM) or directly initialized by the UBM. The models mean vectors are then re-estimated under some Large Margin constraints. We carried out experiments on full speaker recognition tasks under the NIST-SRE 2006 core condition. The experimental results are very satisfactory and show that our Large Margin modeling approach is very promising.
Complete list of metadatas

Cited literature [191 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00807563
Contributor : Reda Jourani <>
Submitted on : Wednesday, April 3, 2013 - 7:58:58 PM
Last modification on : Friday, January 10, 2020 - 9:10:16 PM
Long-term archiving on: Thursday, July 4, 2013 - 4:12:15 AM

Identifiers

  • HAL Id : tel-00807563, version 1

Collections

Citation

Reda Jourani. Reconnaissance automatique du locuteur par des GMM à grande marge. Traitement du signal et de l'image [eess.SP]. Université Paul Sabatier - Toulouse III, 2012. Français. ⟨tel-00807563⟩

Share

Metrics

Record views

854

Files downloads

12777