Skip to Main content Skip to Navigation

Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems

Abstract : This document proposes to learn the behaviour of the dialogue manager of a spoken dialogue system from a set of rated dialogues. This learning is performed through reinforcement learning. Our method does not require the definition of a representation of the state space nor a reward function. These two high-level parameters are learnt from the corpus of rated dialogues. It is shown that the spoken dialogue designer can optimise dialogue management by simply defining the dialogue logic and a criterion to maximise (e.g user satisfaction). The methodology suggested in this thesis first considers the dialogue parameters that are necessary to compute a representation of the state space relevant for the criterion to be maximized. For instance, if the chosen criterion is user satisfaction then it is important to account for parameters such as dialogue duration and the average speech recognition confidence score. The state space is represented as a sparse distributed memory. The Genetic Sparse Distributed Memory for Reinforcement Learning (GSDMRL) accommodates many dialogue parameters and selects the parameters which are the most important for learning through genetic evolution. The resulting state space and the policy learnt on it are easily interpretable by the system designer. Secondly, the rated dialogues are used to learn a reward function which teaches the system to optimise the criterion. Two algorithms, reward shaping and distance minimisation are proposed to learn the reward function. These two algorithms consider the criterion to be the return for the entire dialogue. These functions are discussed and compared on simulated dialogues and it is shown that the resulting functions enable faster learning than using the criterion directly as the final reward. A spoken dialogue system for appointment scheduling was designed during this thesis, based on previous systems, and a corpus of rated dialogues with this system were collected. This corpus illustrates the scaling capability of the state space representation and is a good example of an industrial spoken dialogue system upon which the methodology could be applied
Complete list of metadatas

Cited literature [43 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, June 6, 2018 - 2:30:06 PM
Last modification on : Tuesday, October 20, 2020 - 11:26:28 AM
Long-term archiving on: : Friday, September 7, 2018 - 1:50:39 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01809184, version 1


Layla El Asri. Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems. Machine Learning [cs.LG]. Université de Lorraine, 2016. English. ⟨NNT : 2016LORR0350⟩. ⟨tel-01809184⟩



Record views


Files downloads