Skip to Main content Skip to Navigation

Reconnaissance automatique du locuteur par des GMM à grande marge

Reda Jourani 1 
Abstract : Most of state-of-the-art speaker recognition systems are based on Gaussian Mixture Models (GMM), trained using maximum likelihood estimation and maximum a posteriori (MAP) estimation. The generative training of the GMM does not however directly optimize the classification performance. For this reason, discriminative models, e.g., Support Vector Machines (SVM), have been an interesting alternative since they address directly the classification problem, and they lead to good performances. Recently a new discriminative approach for multiway classification has been proposed, the Large Margin Gaussian mixture models (LM-GMM). As in SVM, the parameters of LM-GMM are trained by solving a convex optimization problem. However they differ from SVM by using ellipsoids to model the classes directly in the input space, instead of half-spaces in an extended high-dimensional space. While LM-GMM have been used in speech recognition, they have not been used in speaker recognition (to the best of our knowledge). In this thesis, we propose simplified, fast and more efficient versions of LM-GMM which exploit the properties and characteristics of speaker recognition applications and systems, the LM-dGMM models. In our LM-dGMM modeling, each class is initially modeled by a GMM trained by MAP adaptation of a Universal Background Model (UBM) or directly initialized by the UBM. The models mean vectors are then re-estimated under some Large Margin constraints. We carried out experiments on full speaker recognition tasks under the NIST-SRE 2006 core condition. The experimental results are very satisfactory and show that our Large Margin modeling approach is very promising.
Complete list of metadata

Cited literature [191 references]  Display  Hide  Download
Contributor : Reda Jourani Connect in order to contact the contributor
Submitted on : Wednesday, April 3, 2013 - 7:58:58 PM
Last modification on : Monday, July 4, 2022 - 8:46:56 AM
Long-term archiving on: : Thursday, July 4, 2013 - 4:12:15 AM


  • HAL Id : tel-00807563, version 1


Reda Jourani. Reconnaissance automatique du locuteur par des GMM à grande marge. Traitement du signal et de l'image [eess.SP]. Université Paul Sabatier - Toulouse III, 2012. Français. ⟨tel-00807563⟩



Record views


Files downloads