Skip to Main content Skip to Navigation

Attelage de systèmes de transcription automatique de la parole

Abstract : This thesis presents work in the area of Large Vocabulary ContinuousSpeech Recognition (LVCSR) system combination. The thesis focuses onmethods for harnessing heterogeneous systems in order to increase theefficiency of speech recognizer with reduced latency.Automatic Speech Recognition (ASR) is affected by many variabilitiespresent in the speech signal, therefore single ASR systems are usually unableto deal with all these variabilities. Considering these limitations, combinationmethods are proposed as alternative strategies to improve recognitionaccuracy using multiple recognizers developed at different research siteswith different recognition strategies. System combination techniques areusually used within multi-passes ASR architecture. Outputs of two or moreASR systems are combined to estimate the most likely hypothesis amongconflicting word pairs or differing hypotheses for the same part of utterance.The contribution of this thesis is twofold. First, we study and analyze theintegrated driven decoding combination method which consists in guidingthe search algorithm of a primary ASR system by the one-best hypothesesof auxiliary systems. Thus we propose some improvements in order to makethe driven decoding more efficient and generalizable. The proposed methodis called BONG and consists in using Bag Of N-Gram auxiliary hypothesisfor the driven decoding.Second, we propose a new framework for low latency paralyzed single-passspeech recognizer harnessing. We study various theoretical harnessingmodels and we present an example of harnessing implementation basedon client/server distributed architecture. Afterwards, we suggest differentcombination methods adapted to the presented harnessing architecture:first we extend the BONG combination method for low latency paralyzedsingle-pass speech recognizer systems collaboration. Then we propose, anadaptation of the ROVER combination method to be performed during thedecoding process using a local vote procedure followed by voting based onword frequencies.
Document type :
Complete list of metadata

Cited literature [91 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Monday, July 1, 2013 - 12:37:11 PM
Last modification on : Tuesday, March 31, 2020 - 3:21:44 PM
Long-term archiving on: : Wednesday, October 2, 2013 - 4:12:32 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00839990, version 1


Fethi Bougares. Attelage de systèmes de transcription automatique de la parole. Ordinateur et société [cs.CY]. Université du Maine, 2012. Français. ⟨NNT : 2012LEMA1026⟩. ⟨tel-00839990⟩



Les métriques sont temporairement indisponibles