Skip to Main content Skip to Navigation

Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole

Abstract : In this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task.
Complete list of metadata

Cited literature [121 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, July 4, 2019 - 2:06:59 PM
Last modification on : Friday, March 25, 2022 - 9:44:36 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02173343, version 1


Zied Elloumi. Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole. Automatique. Université Grenoble Alpes, 2019. Français. ⟨NNT : 2019GREAM005⟩. ⟨tel-02173343⟩



Record views


Files downloads