Maximum-likelihood linear regression coefficients as features for speaker recognition

Marc Ferràs Font

Thèse Année : 2009

Maximum-likelihood linear regression coefficients as features for speaker recognition

Utilisation des coefficients de régression linéaire par maximum de vraisemblance comme paramètres pour la reconnaissance automatique du locuteur

(1)

Marc Ferràs Font

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

The goal of this thesis is to find new and efficient features for speaker recognition. We are mostly concerned with the use of the Maximum-Likelihood Linear Regression (MLLR) family of adaptation techniques as features in speaker recognition systems. MLLR transformcoefficients are able to capture speaker cues after adaptation of a speaker-independent model using speech data. The resulting supervectors are high-dimensional and no underlying model guiding its generation is assumed a priori, becoming suitable for SVM for classification. This thesis brings some contributions to the speaker recognition field by proposing new approaches to feature extraction and studying existing ones via experimentation on large corpora: 1. We propose a compact yet efficient system, MLLR-SVM, which tackles the issues of transcript- and language-dependency of the standard MLLR-SVM approach by using single-class Constrained MLLR (CMLLR) adaptation transforms together with Speaker Adaptive Training (SAT) of a Universal Background Model (UBM). 1- When less data samples than dimensions are available. 2- We propose several alternative representations of CMLLR transformcoefficients based on the singular value and symmetric/skew-symmetric decompositions of transform matrices. 3- We develop a novel framework for feature-level inter-session variability compensation based on compensation of CMLLR transform supervectors via Nuisance Attribute Projection (NAP). 4- We perform a comprehensive experimental study of multi-class (C)MLLR-SVM systems alongmultiple axes including front-end, type of transform, type fmodel,model training and number of transforms. 5- We compare CMLLR and MLLR transform matrices based on an analysis of properties of their singular values. 6- We propose the use of lattice-basedMLLR as away to copewith erroneous transcripts in MLLR-SVMsystems using phonemic acoustic models.

Mots clés

speech processing speaker recognition speaker adaptation support vector machine

traitement du langage MLLR reconnaissance du locuteur adaptation du locuteur

Domaines

Linguistique

Fichier principal

these-ferras2009.pdf (2.29 Mo)

Magali Roserat-Brilhac : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-00616673

Soumis le : mardi 23 août 2011-17:06:51

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : lundi 12 novembre 2012-15:45:10

Dates et versions

tel-00616673 , version 1 (23-08-2011)

Identifiants

HAL Id : tel-00616673 , version 1

Citer

Marc Ferràs Font. Maximum-likelihood linear regression coefficients as features for speaker recognition. Linguistics. Université Paris Sud - Paris XI, 2009. English. ⟨NNT : ⟩. ⟨tel-00616673⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN

467 Consultations

326 Téléchargements

Maximum-likelihood linear regression coefficients as features for speaker recognition

Utilisation des coefficients de régression linéaire par maximum de vraisemblance comme paramètres pour la reconnaissance automatique du locuteur

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager