Skip to Main content Skip to Navigation
Theses

Maximum-likelihood linear regression coefficients as features for speaker recognition

Abstract : The goal of this thesis is to find new and efficient features for speaker recognition. We are mostly concerned with the use of the Maximum-Likelihood Linear Regression (MLLR) family of adaptation techniques as features in speaker recognition systems. MLLR transformcoefficients are able to capture speaker cues after adaptation of a speaker-independent model using speech data. The resulting supervectors are high-dimensional and no underlying model guiding its generation is assumed a priori, becoming suitable for SVM for classification. This thesis brings some contributions to the speaker recognition field by proposing new approaches to feature extraction and studying existing ones via experimentation on large corpora: 1. We propose a compact yet efficient system, MLLR-SVM, which tackles the issues of transcript- and language-dependency of the standard MLLR-SVM approach by using single-class Constrained MLLR (CMLLR) adaptation transforms together with Speaker Adaptive Training (SAT) of a Universal Background Model (UBM). 1- When less data samples than dimensions are available. 2- We propose several alternative representations of CMLLR transformcoefficients based on the singular value and symmetric/skew-symmetric decompositions of transform matrices. 3- We develop a novel framework for feature-level inter-session variability compensation based on compensation of CMLLR transform supervectors via Nuisance Attribute Projection (NAP). 4- We perform a comprehensive experimental study of multi-class (C)MLLR-SVM systems alongmultiple axes including front-end, type of transform, type fmodel,model training and number of transforms. 5- We compare CMLLR and MLLR transform matrices based on an analysis of properties of their singular values. 6- We propose the use of lattice-basedMLLR as away to copewith erroneous transcripts in MLLR-SVMsystems using phonemic acoustic models.
Document type :
Theses
Complete list of metadatas

Cited literature [136 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00616673
Contributor : Magali Roserat-Brilhac <>
Submitted on : Tuesday, August 23, 2011 - 5:06:51 PM
Last modification on : Thursday, December 10, 2020 - 12:31:16 PM
Long-term archiving on: : Monday, November 12, 2012 - 3:45:10 PM

Identifiers

  • HAL Id : tel-00616673, version 1

Collections

Citation

Marc Ferràs Font. Maximum-likelihood linear regression coefficients as features for speaker recognition. Linguistics. Université Paris Sud - Paris XI, 2009. English. ⟨tel-00616673⟩

Share

Metrics

Record views

685

Files downloads

306