Parole de locuteur : performance et confiance en identification biométrique vocale

Abstract : This thesis explores the use of biometric speech. Speech is subjected to many constraints based on origins of the speaker (geographical , social and cultural ), but also according to his performative goals. The speaker may be regarded as a factor of variation in the speech , among others. In this work, we present some answers to the following two questions:- Are all speech samples equivalent to recognize a speaker?- How are structured the different acoustic cues carrying information about the speaker ?In a first step, a protocol to assess the human ability to discriminate a speaker from a speech sample using NIST-HASR 2010 data is presented. This task is difficult for our listeners who are naive or experienced. In this context, neither the (quasi) unanimity or the self-assessment do not assure the confidence in the veracity of the submitted answer .In a second step, the influence of the choice of a sample speech on the performance of automatic systems is quantified using two databases, NIST and BREF and two systems RAL , Alize / SpkDet (LIA, UBM-GMM system) and Idento (SRI, i-vector system).The two RAL systems show significant differences in performance measured using a measure of relative variation around the average EER, Vr (for NIST Idento Vr = 1.41 and Vr Alize / SpkDet = 1.47 and BREF, Vr = 3.11) depending on the choice of the training file used for each speaker. These very large variations in performance show the sensitivity of automatic systems to the speech sample. This sensitivity must be measured to make the systems more reliable .To explain the importance of the choice of the speech sample and find the relevant cues, the effect of the speaker on the variance of various acoustics features is measured (η 2) . F0 is strongly dependent of the speaker, independently of the vowel. Some phonemes are more discriminative : nasal consonants, fricatives , nasal vowels, oral half closed to open vowels .This work is a first step towards to understand where is the speaker in speech using as well the human perception as automatic systems . If we have shown that there was a cepstral difference between the more and less efficient models, it remains to understand how to bind the speaker to the speech production. Finally, following this work, we wish to explore more in detail the influence of language on speaker recognition. Even if our results indicate that for American English and French , the same categories of phonemes are the carriers of information about the speaker , it remains to confirm this on other languages ​​.
Document type :
Complete list of metadatas

Cited literature [1 references]  Display  Hide  Download
Contributor : Abes Star <>
Submitted on : Friday, August 29, 2014 - 3:12:08 PM
Last modification on : Friday, March 22, 2019 - 11:34:06 AM
Long-term archiving on : Sunday, November 30, 2014 - 10:47:16 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00995071, version 2



Juliette Kahn. Parole de locuteur : performance et confiance en identification biométrique vocale. Autre [cs.OH]. Université d'Avignon, 2011. Français. ⟨NNT : 2011AVIG0187⟩. ⟨tel-00995071v2⟩



Record views


Files downloads