Skip to Main content Skip to Navigation

Diagnostic pour la combinaison de systèmes de reconnaissance automatique de la parole.

Abstract : Automatic Speech Recognition (ASR) is affected by many variabilities present in the speech signal. Despite sophisticated techniques, a single ASR system is usually incapable of considering all these variabilities. We propose to use various sources of acoustic information in order to increase precision and robustness.

Combination of various acoustic feature sets is motivated by the assumption that some characteristics that are de-emphasized by a particular feature set are emphasized by another. Therefore, the goal is to make the most of their strengths. In addition, acoustic models make different partition of the acoustic space so that they can be used in a combination scheme relying on their complementarity.

Diagnosis is at the core of this research. Performance analysis of each feature set brings out specific contexts where the prediction of the recognition result is possible. We propose a diagnosis architecture in which the ASR system is shown as a "channel model" which takes as input the phonemes present in the speech signal and outputs phoneme hypotheses given by the system. This architecture allows different sources of confusion to be separated within the recognition system. The performed analyses enable the introduction of post-decoding combination strategies at a high segmental level (word or phoneme).

Combination of a posteriori probabilities of states of a Hidden Markov Model (HMM) given a feature frame is also proposed. In order to better estimate such a posteriori probabilities, probabilities obtained with several acoustic models are fused. For the sake of consistency, the topology of the acoustic models has to be equivalent. In consequence, we propose a new fast, efficient protocol to train models having the same topology but using different acoustic feature sets. Several methods to estimate weighting factors and to generate complementary acoustic models for combination are also suggested.
Document type :
Complete list of metadata

Cited literature [5 references]  Display  Hide  Download
Contributor : Loïc Barrault Connect in order to contact the contributor
Submitted on : Friday, October 16, 2009 - 6:10:39 PM
Last modification on : Tuesday, January 14, 2020 - 10:38:05 AM
Long-term archiving on: : Tuesday, June 15, 2010 - 9:56:57 PM


  • HAL Id : tel-00424699, version 1



Loïc Barrault. Diagnostic pour la combinaison de systèmes de reconnaissance automatique de la parole.. Informatique [cs]. Université d'Avignon, 2008. Français. ⟨tel-00424699⟩



Record views


Files downloads