Skip to Main content Skip to Navigation

Reinforcement learning for Dialogue Systems optimization with user adaptation.

Nicolas Carrara 1, 2, 3
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : The most powerful artificial intelligence systems are now based on learned statistical models. In order to build efficient models, these systems must collect a huge amount of data on their environment. Personal assistants, smart-homes, voice-servers and other dialogue applications are no exceptions to this statement. A specificity of those systems is that they are designed to interact with humans, and as a consequence, their training data has to be collected from interactions with these humans. As the number of interactions with a single person is often too scarce to train a proper model, the usual approach to maximise the amount of data consists in mixing data collected with different users into a single corpus. However, one limitation of this approach is that, by construction, the trained models are only efficient with an "average" human and do not include any sort of adaptation; this lack of adaptation makes the service unusable for some specific group of persons and leads to a restricted customers base and inclusiveness problems. This thesis proposes solutions to construct Dialogue Systems that are robust to this problem by combining Transfer Learning and Reinforcement Learning. It explores two main ideas: The first idea of this thesis consists in incorporating adaptation in the very first dialogues with a new user. To that extend, we use the knowledge gathered with previous users. But how to scale such systems with a growing database of user interactions? The first proposed approach involves clustering of Dialogue Systems (tailored for their respective user) based on their behaviours. We demonstrated through handcrafted and real user-models experiments how this method improves the dialogue quality for new and unknown users. The second approach extends the Deep Q-learning algorithm with a continuous transfer process. The second idea states that before using a dedicated Dialogue System, the first in- teractions with a user should be handled carefully by a safe Dialogue System common to all users. The underlying approach is divided in two steps. The first step consists in learning a safe strategy through Reinforcement Learning. To that extent, we introduced a budgeted Reinforcement Learning framework for continuous state space and the underlying extensions of classic Reinforcement Learning algorithms. In particular, the safe version of the Fitted-Q algorithm has been validated, in term of safety and efficiency, on a dialogue system tasks and an autonomous driving problem. The second step consists in using those safe strategies when facing new users; this method is an extension of the classic ε-greedy algorithm.
Complete list of metadatas

Cited literature [254 references]  Display  Hide  Download
Contributor : Nicolas Carrara <>
Submitted on : Monday, December 23, 2019 - 5:49:30 PM
Last modification on : Friday, December 11, 2020 - 6:44:05 PM
Long-term archiving on: : Tuesday, March 24, 2020 - 12:31:17 PM


Files produced by the author(s)


  • HAL Id : tel-02422691, version 1


Nicolas Carrara. Reinforcement learning for Dialogue Systems optimization with user adaptation.. Artificial Intelligence [cs.AI]. Ecole Doctoral Science pour l'Ingénieur Université Lille Nord-de-France, 2019. English. ⟨tel-02422691⟩



Record views


Files downloads