Skip to Main content Skip to Navigation

Apprentissage automatique et compréhension dans le cadre d’un dialogue homme-machine téléphonique à initiative mixte

Abstract : Spoken dialogues systems are interfaces between users and services. Simple examples of services for which theses dialogue systems can be used include : banking, booking (hotels, trains, flights), etc. Dialogue systems are composed of a number of modules. The main modules include Automatic Speech Recognition (ASR), Spoken Language Understanding (SLU), Dialogue Management and Speech Generation. In this thesis, we concentrate on the Spoken Language Understanding component of dialogue systems. In the past, it has usual to separate the Spoken Language Understanding process from that of Automatic Speech Recognition. First, the Automatic Speech Recognition process finds the best word hypothesis. Given this hypothesis, we then find the best semantic interpretation. This thesis presents a method for the robust extraction of basic conceptual constituents (or concepts) from an audio message. The conceptual decoding model proposed follows a stochastic paradigm and is directly integrated into the Automatic Speech Recognition process. This approach allows us to keep the probabilistic search space on sequences of words produced by the Automatic Speech Recognition module, and to project it to a probabilistic search space of sequences of concepts. The experiments carried out on the French spoken dialogue corpus MEDIA, available through ELDA, show that the performance reached by our new approach is better than the traditional sequential approach. As a starting point for evaluation, the effect that deterioration of word error rate (WER) has on SLU systems is examined though use of different ASR outputs. The SLU performance appears to decrease lineary as a function of ASR word error rate.We show, however, that the proposed integrated method of searching for both words and concets, gives better results to that of a traditionnanl sequential approach. In order to validate our approach, we conduct experiments on the MEDIA corpus in the same assessment conditions used during the MEDIA campaign. The goal is toproduce error-free semantic interpretations from transcripts. The results show that the performance achieved by our model is as good as the systems involved in the evaluation campaign. Studies made on the MEDIA corpus show the concept error rate is related to the word error rate, the size of the training corpus and a priori knwoledge added to conceptual model languages. Error analyses show the interest of modifying the probabilities of word lattice with triggers, a template cache or by using arbitrary rules requiring passage through a portion of the graph and applying the presence of triggers (words or concepts) based on history. Methods based on machine learning are generally quite demanding in terms of amount of training data required. By changing the size of the training corpus, the minimum and the optimal number of dialogues needed for training conceptual language models can be measured. Research conducted in this thesis aims to determine the size of corpus necessary for training conceptual language models from which the semantic evaluation scores stagnated. A correlation is established between the necessary corpus size for learning and the corpus size necessary to validate the manual annotations. In the case of the MEDIA evaluation campaign, it took roughly the same number of examples, first to validate the semantic annotations and, secondly, to obtain a "quality" corpus-trained stochastic model. The addition of a priori knowledge to our stochastic models reduce significantly the size of the training corpus needed to achieve the same scores as a fully stochastic system (nearly half the size for the same score). It allows us to confirm that the addition of basic intuitive rules (numbers, zip codes, dates) gives very encouraging results. It leeds us to create a hybrid system combining corpus-based and knowledge-based models. The second part of the thesis examines the application of the understanding module to another simple dialogue system task, a callrouting system. A problem with this specific task is a lack of data available for training the requiered language models. We attempt to resolve this issue by supplementing he in-domain data with various other generic corpora already available, and data from the MEDIA campaing. We show the benefits of integrating a call classification task in a SLU process. Unfortunately, we have very little training corpus in the field under consideration. By using our integrated approach to decode concepts, along with an integrated process, we propose a bag of words and concepts approach. This approach used by a classifier achieved encouraging call classification rates on the test corpus, while the WER was relativelyhigh. The methods developed are shown to improve the call routing system process robustness.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, May 10, 2011 - 4:49:05 PM
Last modification on : Thursday, November 25, 2021 - 3:12:06 PM
Long-term archiving on: : Thursday, August 11, 2011 - 2:36:36 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00591997, version 1



Christophe Servan. Apprentissage automatique et compréhension dans le cadre d’un dialogue homme-machine téléphonique à initiative mixte. Autre [cs.OH]. Université d'Avignon, 2008. Français. ⟨NNT : 2008AVIG0173⟩. ⟨tel-00591997⟩



Record views


Files downloads