Skip to Main content Skip to Navigation
Theses

Modèles acoustiques à structure temporelle renforcée pour la vérification du locuteur embarquée

Abstract : SPEAKER verification aims to validate or invalidate identity of a person by using his/her speech characteristics. Integration of an automatic speaker verification engine on embedded devices has to respect two types of constraint, namely : – limited material resources such as memory and computational power ; – limited speech, both training and test sequences. Current state-of-the-art systems do not take advantage of the temporal structure of speech. We propose to use this information through a user-customised framework, in order to compensate for the short duration speech signals that are common in the given scenario. A preliminary study allows us to evaluate the influence of text-dependency on the state-of-the-art GMM/UBM (Gaussian Mixture Model / Universal Background Model) approach. By constraining this approach, usually dedicated to text-independent speaker recognition, we show that a lexical constraint allows a relative reduction of 30% in error rate when impostors do not know the client password. We introduce a specific acoustic architecture which takes advantage of the temporal structure of speech through a low cost user-customised password framework. This three stage hierarchical architecture allows a layered specialization of the acoustic models. The upper layer, which is a classical UBM, aims to model the general acoustic space. The middle layer contains the text-independent specific characteristics of each speaker. These text-independent speaker models are obtained by a classical GMM/UBM adaptation. The previous text-independent speaker model is used to obtain a left-right Semi-Continuous Hidden Markov Model (SCHMM) with the goal of harnessing the Temporal Structure Information (TSI) of the utterance chosen by the given speaker. This TSI is shown to reduce the error rate by 60% when impostors do not know the client password. In order to reinforce the temporal structure of speech, we propose a new approach for speaker verification. The speech modality is reinforced by additional temporal information. Synchronisation points extracted from an additional process are used to constrain the acoustic decoding. Such an additional modality could be used in order to add different structural information and to thwart impostor attacks such as playback. Thanks to the specific aspects of our system, this aided-decoding shows an acceptable level of complexity. In order to reinforce the relaxed synchronisation between states and frames due to the SCHMM structure of the TSI modelling, we propose to embed an external information during the audio decoding by adding further time-constraints. This information is here labelled external to reflect that it is aimed to come from an independent process. Experiments were performed on the BIOMET part of the MyIdea database by using an external information gathered from an automatic phonetical alignment. We show that adding a synchronisation constraint to our acoustic approach allows to reduce impostor scores and to decrease the error rate from 20% when impostor do not know the client password. In others conditions, when impostors know the passwords, the performance remains similar to the original baseline. The extraction of the synchronisation constraint from a video stream seems difficult to accommodate with embedded limited resources. We proposed a first exploration of the use of a video stream in order to constrain the acoustic process. This simple video processing did not allow us to extract any pertinent information
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00453645
Contributor : Abes Star :  Contact
Submitted on : Friday, February 5, 2010 - 12:53:09 PM
Last modification on : Tuesday, January 14, 2020 - 10:38:05 AM
Document(s) archivé(s) le : Wednesday, November 30, 2016 - 1:01:26 PM

File

2009AVIG0170_0_0.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00453645, version 1

Collections

Citation

Anthony Larcher. Modèles acoustiques à structure temporelle renforcée pour la vérification du locuteur embarquée. Autre [cs.OH]. Université d'Avignon, 2009. Français. ⟨NNT : 2009AVIG0170⟩. ⟨tel-00453645⟩

Share

Metrics

Record views

566

Files downloads

387