Skip to Main content Skip to Navigation

Reconnaissance de la parole dans un contexte de cours magistraux : évaluation, avancées et enrichissement

Abstract : This thesis is part of a study that explores automatic transcription potential for the instrumentation of educational situations.Our contribution covers several axes.First, we describe the enrichment and the annotation of COCo dataset that we produced as part of the ANR PASTEL project.This corpus is composed of different lectures' videos. Each lecture is related to a particular field (natural language, graphs, functions ...).In this multi-thematic framework, we are interested in the problem of the linguistic adaptation of automatic speech recognition systems (ASR).The proposed language model adaptation is based both on the lecture presentation supports provided by the teacher and in-domain data collected automatically from the web.Then, we focused on the ASR evaluation problem.The existing metrics don't allow a precise evaluation of the transcriptions' quality.Thus, we proposed two evaluation protocols.The first one deals with an intrinsic evaluation, making it possible to estimate performance only for domain words of each lecture (IWER_Average).The second protocol offers an extrinsic evaluation, which estimates the performance for two tasks exploiting transcription: information retrieval and indexability.Our experimental results show that the global word error rate (WER) masks the gain provided by language model adaptation.So, to better evaluate this gain, it seems particularly relevant to use specific measures, like those presented in this thesis.As LM adaptation is based on a collection of data from the web, we study the reproducibility of language model adaptation results by comparing the performances obtained over a long period of time.Over a collection period of one year, we were able to show that, although the data on the Web changed in part from one month to the next, the performance of the adapted transcription systems remainedconstant (i.e. no significant performance changes), no matter the period considered.Finally, we are intersted on thematic segmentation of ASR output and alignment of slides with oral lectures.For thematic segmentation, the integration of slide's change information into the TextTiling algorithm provides a significant gain in terms of F-measure.For alignment of slides with oral lectures, we have calculated a cosine similarity between the TF-IDF representation of the transcription segments andthe TF-IDF representation of text slides and we have imposed a constraint torespect the sequential order of the slides and transcription segments.Also, we have considered a confidence measure todiscuss the reliability of the proposed approach.
Document type :
Complete list of metadata

Cited literature [307 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, September 2, 2020 - 3:20:09 PM
Last modification on : Wednesday, January 26, 2022 - 5:30:23 PM
Long-term archiving on: : Wednesday, December 2, 2020 - 4:49:12 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02928451, version 1


Salima Mdhaffar. Reconnaissance de la parole dans un contexte de cours magistraux : évaluation, avancées et enrichissement. Informatique et langage [cs.CL]. Le Mans Université, 2020. Français. ⟨NNT : 2020LEMA1008⟩. ⟨tel-02928451⟩



Record views


Files downloads