N. Abe, P. Melville, C. Pendus, K. Chandan, D. L. Reddy et al., Optimizing Debt Collections Using Constrained Reinforcement Learning, Conference of the Special Interest Group on Knowledge Discovery and Data Mining, 2010.

J. Achiam, D. Held, A. Tamar, and P. Abbeel, Constrained Policy Optimization, International Conference on Machine Learning, p.86, 2017.

J. Allen, Natural Language Understanding, vol.20, 1995.

E. Altman, Constrained Markov Decision Processes, vol.70, 1999.
URL : https://hal.archives-ouvertes.fr/inria-00074109

H. Ammar, E. Bou, M. E. Eaton, C. Taylor, K. Mocanu et al., An automated measure of mdp similarity for transfer in reinforcement learning, Workshops at the Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, 2014.

I. Asimov, The Machine That Won the War. The Magazine of Fantasy and Science Fiction, 1961.

. Chapter, Continuous transfer in Deep Q-learning

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, 2002.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, vol.57, p.55, 2002.

J. L. Austin, How to do things with words, p.28, 1962.

S. Banach, Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales, Fundamenta Mathematicae, p.37, 1922.

M. Barlier, R. Laroche, and O. Pietquin, Training Dialogue Systems With Human Advice, International Conference on Autonomous Agents and Multiagent Systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01945831

, Training Dialogue Systems With Human Advice, International Conference on Autonomous Agents and Multiagent Systems, 2018.

M. Barlier, J. Perolat, R. Laroche, and O. Pietquin, Human-Machine Dialogue as a Stochastic Game, Conference of the Special Interest Group on Discourse and Dialogue, vol.55, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01225848

, Human-machine dialogue as a stochastic game, Conference of the Special Interest Group on Discourse and Dialogue, p.97, 2015.

A. Barreto, D. Borsa, J. Quan, T. Schaul, D. Silver et al., Transfer in deep reinforcement learning using successor features and generalised policy improvement, 2018.

R. Bellman, Dynamic programming and Lagrange multipliers, National Academy of Sciences of the USA, p.37, 1956.

, A Markovian decision process, Journal of Mathematics and Mechanics, vol.37, p.33, 1957.

R. Bellman and S. E. Dreyfus, Functional Approximations and Dynamic Programming, Mathematics of Computation, vol.38, 1959.

G. Ben, Word Recognition Computer Program, 1966.

M. I. , , p.19

D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Optimization and Neural Computation Series), Athena Scientific, p.39, 1996.

F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, Journal of Mathematical Analysis and Applications, vol.71, 1985.

A. W. Biermann, M. Philip, and . Long, The composition of messages in speechgraphics interactive systems, International Symposium on Spoken Dialogue, 1996.

M. Bi?kowski, J. Donahue, S. Dieleman, A. Clark, E. Elsen et al., High Fidelity Speech Synthesis with Adversarial Networks, p.19, 2019.

D. G. Bobrow, Natural Language Input for a Computer Problem Solving System, vol.20, 1964.

D. G. Bobrow, M. Ronald, M. Kaplan, D. A. Kay, H. Norman et al., GUS, a Frame-driven Dialog System, Artificial Intelligence, 1977.

C. Boutilier and T. Lu, Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes, Conference on Uncertainty in Artificial Intelligence, vol.86, 2016.

W. Bruce, , vol.22, 2011.

P. Budzianowski and I. Vulic, Hello, It's GPT-2 -How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems, p.97, 2019.

N. Carrara, R. Laroche, J. Bouraoui, T. Urvoy, and O. Pietquin, A Fitted-Q Algorithm for Budgeted MDPs, Workshop on Safety, Risk and Uncertainty in Reinforcement Learning, Conference on Uncertainty in Artificial Intelligence, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01928092

, A Fitted-Q Algorithm for Budgeted MDPs, European Workshop on Reinforcement Learning, 2018.

, Safe transfer learning for dialogue applications, International Conference on Statistical Language and Speech Processing, vol.50, 2018.

N. Carrara, R. Laroche, and O. Pietquin, Online learning and transfer for user adaptation in dialogue systems, Joint special session on negotiation dialog, Workshop on the Semantics and Pragmatics of Dialogue-Conference of the Special Interest Group on Discourse and Dialogue, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01557775

. Chapter, Continuous transfer in Deep Q-learning

N. Carrara, E. Leurent, R. Laroche, T. Urvoy, J. Bouraoui et al., Budgeted Reinforcement Learning in Continuous State Space, Workshop on Safety Risk and Uncertainty in Reinforcement Learning at Conference on Uncertainty in Artificial Intelligence, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02375727

N. Carrara, E. Leurent, R. Laroche, T. Urvoy, O. Maillard et al., Budgeted Reinforcement Learning in Continuous State Space, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02375727

I. Casanueva, T. Hain, H. Christensen, R. Marxer, and P. Green, Knowledge transfer between speakers for personalised dialogue management, Conference of the Special Interest Group on Discourse and Dialogue, pp.49-51, 2015.

, Knowledge transfer between speakers for personalised dialogue management, Conference of the Special Interest Group on Discourse and Dialogue, p.89, 2015.

S. Chandramohan, M. Geist, F. Lefevre, and O. Pietquin, Clustering behaviors of spoken dialogue systems users, IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00685009

, Co-adaptation in spoken dialogue systems, Natural interaction with robots, knowbots and smartphones, p.97, 2014.

S. Chandramohan, M. Geist, and O. Pietquin, Optimizing Spoken Dialogue Management with Fitted Value Iteration, Conference of the International Speech Communication Association, vol.80, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00553184

D. Chaplot, G. Singh, . Lample, R. Kanthashree-mysore-sathyendra, and . Salakhutdinov, Transfer deep reinforcement learning in 3d environments: An empirical study, Deep Reinforcement Learning Workshop at Conference on Neural Information Processing Systems, p.116, 2016.

H. Chen, X. Liu, D. Yin, and J. Tang, A Survey on Dialogue Systems: Recent Advances and New Frontiers". In: Exploration Newsletter (cited on pages, vol.27, p.22, 2017.

L. Chen, C. Chang, Z. Chen, B. Tan, M. Gasic et al., Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management, IEEE International Conference on Acoustics, Speech and Signal Processing, vol.51, p.48, 2018.

, A.3 Conclusion 121

J. Chorowski and N. Jaitly, Towards better decoding and language model integration in sequence to sequence models, Conference of the International Speech Communication Association, p.27, 2017.

Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, Journal of Machine Learning Research, vol.86, p.70, 2018.

Y. Chow, A. Tamar, S. Mannor, and M. Pavone, Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach, Conference on Neural Information Processing Systems, 2015.

A. Clarke and . Charles, 2001: A Space Odyssey, p.18, 1968.

K. Colby and . Mark, Artificial Paranoia: A Computer Simulation of Paranoid Processes, p.21, 1975.

J. W. Cortada, The Digital Hand: Volume II: How Computers Changed the Work of American Financial, Telecommunications, Media, and Entertainment Industries, 2005.

H. Cuayahuitl, S. Keizer, and O. Lemon, Strategic Dialogue Management via Deep Reinforcement Learning, Workshop on Deep Reinforcement Learning, Conference on Neural Information Processing Systems, 2015.

L. Cummings, The Routledge pragmatics encyclopedia. Routledge, p.21, 2010.

C. Dann, L. Li, W. Wei, and E. Brunskill, Policy Certificates: Towards Accountable Reinforcement Learning, International Conference on Machine Learning, vol.70, 2019.

K. H. Davis, S. Biddulph, and . Balashek, Automatic recognition of spoken digits, Journal of the Acoustical Society of America, 1952.

P. Dayan, Improving generalization for temporal difference learning: The successor representation, Neural Computation, p.116, 1993.

M. Deisenroth, C. E. Peter, and . Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, International Conference on Machine Learning, p.116, 2011.

L. Deng, G. Tür, X. He, and D. Z. Hakkani-tür, Use of kernel deep convex networks and end-to-end learning for spoken language understanding, IEEE Spoken Language Technology Workshop, p.28, 2012.

. Chapter, Continuous transfer in Deep Q-learning

K. Dennis, How Klattalk became DECtalk: An Academic's Experiences in the Business World, The official proceedings of Speech Technology, 1987.

A. Deoras and R. Sarikaya, Deep belief network based semantic taggers for spoken language understanding, Conference of the International Speech Communication Association, 2013.

W. Eckert, E. Levin, and R. Pieraccini, User modeling for spoken dialogue system evaluation, IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, vol.49, 1997.

U. Ehsan, B. Harrison, L. Chan, and M. O. Riedl, Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations, Conference on AI, Ethics, and Society, Association for Computing Machinery-Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, 2018.

E. Asri and L. , Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems, vol.34, 2016.
URL : https://hal.archives-ouvertes.fr/tel-01809184

E. Asri, J. Layla, K. He, and . Suleman, A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems, Conference of the International Speech Communication Association, p.49, 2016.

E. Asri, H. Layla, R. Khouzaimi, O. Laroche, and . Pietquin, Ordinal regression for interaction quality prediction, IEEE International Conference on Acoustics, Speech and Signal Processing, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01107499

E. Asri, L. , and R. Laroche, Will my Spoken Dialogue System be a Slow Learner, In: Conference of the Special Interest Group on Discourse and Dialogue, vol.42, 2013.

E. Asri, R. Layla, O. Laroche, and . Pietquin, Reward Function Learning for Dialogue Management, Frontiers in Artificial Intelligence and Applications, p.51, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00749430

, Task Completion Transfer Learning for Reward Inference, Workshop, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, vol.51, p.50, 2014.

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, International Conference on Machine Learning, vol.48, 2006.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.55, p.40, 2005.

A. Farahmand, M. Massoud, C. Ghavamzadeh, S. Szepesvari, and . Mannor, Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, 2009.

W. Fedus, C. Gelada, Y. Bengio, G. Marc, H. Bellemare et al., Hyperbolic discounting and learning over multiple horizons, p.36, 2019.

K. Ferguson and S. Mahadevan, Proto-transfer learning in markov decision processes using spectral methods, Computer Science Department Faculty Publication Series, vol.46, 2006.

E. Ferrante, A. Lazaric, and M. Restelli, Transfer of task representation in reinforcement learning using policy-based proto-value functions, International Conference on Autonomous Agents and Multiagent Systems, vol.46, 2008.

E. Ferreira and F. Lefèvre, Social Signal and User Adaptation in Reinforcement Learning-based Dialogue Management, Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication, p.41, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01315527

D. Ferrucci, E. Brown, J. Chu-carroll, J. Fan, D. Gondek et al., Building Watson: An Overview of the DeepQA Project, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, vol.20, 2010.

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, International Conference on Machine Learning, p.116, 2017.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, In: Biological cybernetics, vol.36, 1980.

J. Gao, M. Galley, and L. Li, Neural approaches to conversational AI, Foundations and Trends R in Information Retrieval, vol.30, p.21, 2019.

. Chapter, Continuous transfer in Deep Q-learning

J. Garcia and F. Fernandez, A Comprehensive Survey on Safe Reinforcement Learning, Journal of Machine Learning Research, 2015.

M. Gasic, C. Breslin, M. Henderson, D. Kim, M. Szummer et al., POMDP-based dialogue manager adaptation to extended domains, Conference of the Special Interest Group on Discourse and Dialogue, pp.48-51, 2013.

M. Gasic and S. Young, Gaussian processes for pomdp-based dialogue manager optimization, Speech, and Language Processing, 2013.

P. Geibel and F. Wysotzki, Risk-sensitive reinforcement learning applied to control under constraints, In: Journal of Artificial Intelligence Research, p.86, 2005.

A. Genevay and R. Laroche, Transfer Learning for User Adaptation in Spoken Dialogue Systems, International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, vol.50, p.89, 2016.

K. Georgila and . David-r-traum, Reinforcement Learning of Argumentation Dialogue Policies in Negotiation, Conference of the International Speech Communication Association, 2011.

J. Glass, G. Flammia, D. Goodine, M. Phillips, J. Polifroni et al., Multilingual spokenlanguage understanding in the MIT Voyager system, Speech Communication, 1995.

. Goddeau, H. M. David, J. Meng, S. Polifroni, S. Seneff et al., A form-based dialogue manager for spoken language applications, International Conference on Spoken Language Processing, 1996.

E. Goldberg, N. Driedger, and R. I. Kittredge, Using Natural-Language Processing to Produce Weather Forecasts, IEEE Expert: Intelligent Systems and Their Applications, 1994.

A. Graves and N. Jaitly, Towards End-to-end Speech Recognition with Recurrent Neural Networks, International Conference on Machine Learning, vol.27, p.19, 2014.

H. Hashemi and . Baradaran, Query Intent Detection using Convolutional Neural Networks, Workshop on Query Understanding, International Conference on Web Search and Data Mining, 2016.

M. Henderson, Machine Learning for Dialog State Tracking: A Review, International Workshop on Machine Learning in Spoken Language Processing, 2015.

M. Henderson, B. Thomson, and S. Young, Deep Neural Network Approach for the Dialog State Tracking Challenge, Conference of the Special Interest Group on Discourse and Dialogue, 2013.

S. Hochreiter and J. Schmidhuber, Long Short-term Memory, Neural computation, p.19, 1997.

. Homer, . Dudley, R. Riesz, and S. Watkins, A Synthetic Speaker, Journal of Franklin Institute, p.18, 1939.

R. A. Howard, Dynamic Programming and Markov Processes, p.37, 1960.

. Huang, X. Po-sen, J. He, L. Gao, A. Deng et al., Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data, ACM International Conference on Information & Knowledge Management, 2013.

V. Ilievski, C. Musat, A. Hossmann, and M. Baeriswyl, Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning, International Joint Conference on Artificial Intelligence, vol.51, p.48, 2018.

O. Isaac, Ex machina, vol.20, 2015.

G. N. Iyengar, Robust Dynamic Programming, Mathematics of Operations Research, vol.70, 2005.

J. C. Watkins, C. , and P. Dayan, Q-learning, Machine Learning, p.89, 1992.

S. Janarthanam and O. Lemon, Adaptive referring expression generation in spoken dialogue systems: Evaluation with real users, Conference of the Special Interest Group on Discourse and Dialogue, vol.55, 2010.

F. Jelinek, Continuous Speech Recognition by Statistical Methods". In: IEEE 64, p.19, 1976.

S. Jonze, Her (cited on page 21), 2013.

D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, p.27, 2000.

. Chapter, Continuous transfer in Deep Q-learning

K. Kang, S. Belkhale, G. Kahn, P. Abbeel, and S. Levine, Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight, vol.116, p.97, 2019.

K. Johan and A. , Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, 1965.

L. Kaufmann and P. Rousseeuw, Clustering by Means of Medoids, Data Analysis based on the L1-Norm and Related Methods, 1987.

S. Keizer, M. Gasic, F. Jurcicek, F. Mairesse, B. Thomson et al., Parameter estimation for agenda-based user simulation, Conference of the Special Interest Group on Discourse and Dialogue, p.49, 2010.

S. Keizer and V. Rieser, The MaDrIgAL Project: Multi-Dimensional Interaction Management and Adaptive Learning, International Workshop on Domain Adaptation for Dialog Agents, 2016.

, Towards Learning Transferable Conversational Skills using Multi-dimensional Dialogue Modelling, Workshop on the Semantics and Pragmatics of Dialogue, vol.51, p.48, 2018.

H. Khouzaimi, R. Laroche, and F. Lefevre, Optimising turn-taking strategies with reinforcement learning, Conference of the Special Interest Group on Discourse and Dialogue, vol.58, 2015.

, Incremental human-machine dialogue simulation, Dialogues with Social Robots, p.49, 2017.

J. Kim, M. El-khamy, and J. Lee, Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition, Conference of the International Speech Communication Association, p.27, 2017.

S. Kim and M. L. Seltzer, Towards Language-Universal End-to-End Speech Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, p.27, 2018.

M. G. Lagoudakis and R. Parr, Least-squares Policy Iteration, Journal of Machine Learning Research, vol.40, 2003.

P. Langley, Transfer of knowledge in cognitive systems, workshop on Structural Knowledge Transfer for Machine Learning at International Conference on Machine Learning, p.47, 2006.

, A.3 Conclusion

R. Laroche, The complex negotiation dialogue game, Workshop on the Semantics and Pragmatics of Dialogue, vol.96, p.49, 2017.

R. Laroche, P. Bretier, and G. Putois, Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience, Conference of the International Speech Communication Association, 2010.

R. Laroche and A. Genevay, The negotiation dialogue game, Dialogues with Social Robots, vol.55, p.23, 2017.

R. Laroche, G. Putois, and P. Bretier, Optimising a handcrafted dialogue system design, Conference of the International Speech Communication Association, 2010.

R. Laroche, G. Putois, P. Bretier, and B. Bouchon-meunier, Hybridisation of expertise and reinforcement learning in dialogue systems, Conference of the International Speech Communication Association, 2009.

R. Laroche, P. Trichelair, and R. Tachet-des-combes, Safe Policy Improvement with Baseline Bootstrapping, vol.115, p.86, 2019.

G. Larson and . Albert, Knight Rider. National Broadcasting Company (cited on page 18), 1986.

. Lavoie, O. Benoit, E. Ranbow, and . Reiter, Customizable Descriptions of Object-Oriented Models, Advances in Natural Language Processing, 1977.

A. Lazaric, Knowledge transfer in reinforcement learning, vol.46, 2008.

, Transfer in Reinforcement Learning: a Framework and a Survey, Reinforcement Learning -State of the art, vol.89, pp.45-47, 2012.

A. Lazaric, M. Restelli, and A. Bonarini, Transfer of samples in batch reinforcement learning, International Conference on Machine Learning, vol.57, p.55, 2008.

H. M. Le, C. Voloshin, and Y. Yue, Batch Policy Learning under Constraints, International Conference on Machine Learning, p.86, 2019.

Y. Lecun, P. Haffner, Y. Bottou, and . Bengio, Object Recognition with Gradient-Based Learning, Shape, Contour and Grouping in Computer Vision, 1999.

K. Lee, C. Park, N. Kim, and J. Lee, Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System, IEEE International Conference on Acoustics, Speech and Signal Processing, p.27, 2018.

O. Lemon, Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation, Computer Speech & Language, 2011.

O. Lemon and X. Liu, Dialogue policy learning for combinations of noise and user simulation: transfer results, Conference of the Special Interest Group on Discourse and Dialogue, p.49, 2007.

O. Lemon and O. Pietquin, Data-driven methods for adaptive spoken dialogue systems: Computational learning for conversational interfaces, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756740

E. Leurent, Y. Blanco, D. Efimov, and O. Maillard, Approximate Robust Control of Uncertain Dynamical Systems, Workshop on Machine Learning for Intelligent Transportation Systems, Conference on Neural Information Processing Systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01931744

E. Levin and R. Pieraccini, A stochastic model of computer-human interaction for learning dialogue strategies, European Conference on Speech Communication and Technology, vol.49, 1997.

E. Levin, R. Pieraccini, and W. Eckert, A stochastic model of human-machine interaction for learning dialog strategies, IEEE Transactions on speech and audio processing, 2000.

J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley et al., Deep Reinforcement Learning for Dialogue Generation, Conference on Empirical Methods in Natural Language Processing, p.29, 2016.

L. Li, J. D. Williams, and S. Balakrishnan, Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection, International Speech Communication Association, 2009.

L. Li, J. Williams, and S. Balakrishnan, Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection, Conference of the International Speech Communication Association, vol.80, p.55, 2009.

X. Li, Y. Chen, L. Li, J. Gao, and A. Celikyilmaz, End-to-end task-completion neural dialogue systems, 2017.

C. Lin, ROUGE: A Package for Automatic Evaluation of Summaries, Workshop on Text Summarization Branches Out, Annual Meeting of the Association for Computational Linguistics, 2004.

P. Lison, Model-based Bayesian Reinforcement Learning for Dialogue Management, Conference of the International Speech Communication Association, vol.40, 2013.

C. Liu, X. Xu, and D. Hu, Multiobjective Reinforcement Learning: A Comprehensive Overview, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014.

R. Lowe, M. Noseworthy, I. Vlad-serban, N. Angelard-gontier, Y. Bengio et al., Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses, Annual Meeting of the Association for Computational Linguistics, 2017.

R. Lowe, N. Pow, I. Serban, and J. Pineau, The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems, p.39, 2015.

D. G. Luenberger, Investment science, 2013.

J. Macqueen, Some methods for classification and analysis of multivariate observations, Berkeley symposium on mathematical statistics and probability, vol.55, 1967.

S. Mahadevan and M. Maggioni, Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes, Journal of Machine Learning Research, vol.46, 2007.

M. M. Mahmud, M. Hawasly, B. Rosman, and S. Ramamoorthy, Clustering markov decision processes for continual transfer, vol.65, p.55, 2013.

H. Mausser and D. Rosen, Beyond VaR: from measuring risk to managing risk, IEEE Conference on Computational Intelligence for Financial Engineering, 2003.

G. Mesnil, X. He, L. Deng, and Y. Bengio, Investigation of recurrent-neural-network architectures and learning methods for spoken language Chapter A. Continuous transfer in Deep Q-learning understanding, Conference of the International Speech Communication Association, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.77, p.43, 2015.

K. Mo, S. Li, Y. Zhang, J. Li, and Q. Yang, Personalizing a Dialogue System with Transfer Reinforcement Learning, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, vol.51, p.50, 2018.

G. Moore, Imitation Game, vol.20, 2014.

K. Nadjahi, R. Laroche, and R. Tachet-des-combes, Safe Policy Improvement with Soft Baseline Bootstrapping, European Conference on Machine Learning, p.86, 2019.

A. Neustein and J. A. Markowitz, Mobile speech and advanced natural language solutions, 2013.

A. Nilim and L. E. Ghaoui, Robust Control of Markov Decision Processes with Uncertain Transition Matrices, Mathematics of Operations Research, 2005.

A. H. Oh and A. I. Rudnicky, Stochastic language generation for spoken dialogue systems, Advances in Natural Language Processing -NAACL, 2000.

A. Oord, Y. Van-den, I. Li, K. Babuschkin, O. Simonyan et al., Parallel WaveNet: Fast High-Fidelity Speech Synthesis, p.29, 2018.

. Openai, OpenAI Five, p.96, 2018.

D. Palossi, F. Conti, and L. Benini, An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs, p.97, 2019.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, Bleu: a Method for Automatic Evaluation of Machine Translation, Annual Meeting of the Association for Computational Linguistics, 2002.

B. Peng, X. Li, J. Gao, J. Liu, K. Wong et al., Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning, Annual Meeting of the Association for Computational Linguistics, 2018.

R. Perera and P. Nand, Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature, Computing and Informatics, 2017.

S. Perez, Microsoft silences its new A.I. bot Tay, after Twitter users teach it racism, p.97, 2016.

M. Petrik, M. Ghavamzadeh, and Y. Chow, Safe policy improvement by minimizing robust baseline regret, Conference on Neural Information Processing Systems, p.86, 2016.

O. Pietquin, A Framework for Unsupervised Learning of Dialogue Strategies, Presses Universitaires de Louvain, vol.49, p.21, 2004.

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing (TSLP) (cited on pages 40, vol.55, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00617517

P. Poupart, A. Malhotra, P. Pei, K. Kim, B. Goh et al., Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, p.86, 2015.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei et al., Language Models are Unsupervised Multitask Learners, vol.97, p.22, 2019.

O. Rambow, S. Bangalore, and M. Walker, Natural Language Generation in Dialog Systems, International Conference on Human Language Technology Research, 2001.

C. Rasmussen and . Edward, Gaussian processes in machine learning, vol.48, 2003.

M. Riedmiller, Neural Fitted Q Iteration -First Experiences with a Data Efficient Neural Reinforcement Learning Method, European Conference on Machine Learning, vol.42, 2005.

. Chapter, Continuous transfer in Deep Q-learning

C. Rogers, Counseling and psychotherapy, p.21, 1942.

D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, A Survey of Multi-Objective Sequential Decision-Making, Journal of Artificial Intelligence Research, vol.71, p.70, 2013.

F. Rosenblatt, The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain, Psychological Review, 1958.

N. Roy, J. Pineau, and S. Thrun, Spoken dialogue management using probabilistic reasoning, Annual Meeting of the Association for Computational Linguistics, 2000.

P. Rubin and L. Goldstein, The Pattern Playback, 2019.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol.1, 1986.

G. A. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, 1994.

M. Sadek, P. David, F. Bretier, and . Panaget, ARTIMIS: Natural dialogue meets rational agency, vol.28, p.21, 1997.

F. Sadri, F. Toni, and P. Torroni, Dialogues for negotiation: agent varieties and dialogue sequences, International Workshop on Agent Theories, Architectures, and Languages, vol.55, 2001.

H. Saeed, RightClick.io Uses AI-Powered Chatbot to Create a Website, 2019.

. Sarikaya, G. E. Ruhi, B. Hinton, and . Ramabhadran, Deep belief nets for natural language call-routing, IEEE International Conference on Acoustics, Speech and Signal Processing, 2011.

R. C. Schank and L. Tesler, A Conceptual Dependency Parser for Natural Language, Conference on Computational Linguistics, vol.20, 1969.

J. Schatzmann, Statistical User and Error Modelling for Spoken Dialogue Systems, p.49, 2008.

, A.3 Conclusion

J. Schatztnann, M. N. Stuttle, K. Weilhammer, and S. Young, Effects of the user model on simulation-based learning of dialogue strategies, IEEE Workshop on Automatic Speech Recognition and Understanding, p.49, 2005.

R. Scott, Bladerunner, vol.20, 1982.

J. R. Searle, Speech acts: An essay in the philosophy of language, p.28, 1969.

I. Serban, R. Vlad, P. Lowe, L. Henderson, J. Charlin et al., A survey of available corpora for building data-driven dialogue systems, 2015.

I. Serban, A. Vlad, Y. Sordoni, A. Bengio, J. Courville et al., Building end-to-end dialogue systems using generative hierarchical neural network models, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, vol.97, p.30, 2016.

Y. Shen, X. He, J. Gao, L. Deng, and G. , Learning Semantic Representations Using Convolutional Neural Networks for Web Search, International Conference on World Wide Web, 2014.

A. A. Sherstov and P. Stone, Improving action selection in MDP's via knowledge transfer, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, vol.46, 2005.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, 2016.

S. Singh, D. Litman, M. Kearns, and M. Walker, Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system, Journal of Artificial Intelligence Research, 2002.

H. Steinhaus, Sur la division des corps matériels en partie, Bulletin de l'academie polonaise des sciences, vol.55, 1957.

F. T. Sunmola and J. L. Wyatt, Model transfer for Markov decision tasks via parameter matching, Workshop of the UK Planning and Scheduling Special Interest Group, vol.46, 2006.

. Chapter, Continuous transfer in Deep Q-learning

R. S. Sutton, D. Precup, and S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial intelligence, vol.46, 1999.

A. Tamar, D. D. Castro, and S. Mannor, Policy Gradients with Variance Related Risk Criteria, International Conference on Machine Learning, vol.70, 2012.

M. E. Taylor, K. Nicholas, P. Jong, and . Stone, Transferring instances for model-based reinforcement learning, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, vol.46, 2008.

M. E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, vol.89, p.45, 2009.

M. E. Taylor, S. Whiteson, and P. Stone, Transfer via inter-task mappings in policy search reinforcement learning, International Conference on Autonomous Agents and Multiagent Systems, vol.46, 2007.

H. Terry, Baidu's Melody -AI Powered Conversational Bot for Doctors and Patients -The Digital Insurer, 2019.

P. Thomas, G. Theocharous, and M. Ghavamzadeh, High confidence policy improvement, International Conference on Machine Learning, p.86, 2015.

B. Thomson, Statistical Methods for Spoken Dialogue Management, 2013.

B. Thomson and S. Young, Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems, Computer Speech & Language, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00621617

A. Tikhonov and . Nikolaevich, Regularization of incorrectly posed problems, Doklady Akademii Nauk SSSR, vol.64, 1963.

L. Torrey, T. Walker, J. Shavlik, and R. Maclin, Using advice to transfer knowledge acquired in one reinforcement learning task to another, European Conference on Machine Learning, vol.46, 2005.

G. Tür, L. Deng, Z. Dilek, X. Hakkani-tür, and . He, Towards deeper understanding: Deep convex networks for semantic utterance classification, p.135, 2012.

, IEEE International Conference on Acoustics, Speech and Signal Processing, p.28

A. Turing and . Madison, On Computable Numbers, with an Application to the Entscheidungs problem, vol.20, 1936.

, Computing machinery and intelligence, Mind, vol.20, 1950.

S. Ultes, M. Kraus, A. Schmitt, and W. Minker, Qualityadaptive spoken dialogue initiative selection and implications on reward modelling, Conference of the Special Interest Group on Discourse and Dialogue, vol.55, 2015.

S. Ultes, L. M. Rojas-barahona, P. Su, D. Vandyke, D. Kim et al., PyDial: A Multi-domain Statistical Dialogue System Toolkit, Annual Meeting of the Association for Computational Linguistics, p.22, 2017.

A. Undurti, A. Geramifard, N. Roy, and J. P. How, Function Approximation for Continuous Constrained MDPs, p.86, 2010.

S. Van-den-oord, H. Dieleman, K. Zen, O. Simonyan, A. Vinyals et al., WaveNet: A generative model for raw audio, Speech Synthesis Workshop, vol.29, p.19, 2016.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Conference on Neural Information Processing Systems, 2017.

O. Vinyals, I. Babuschkin, J. Chung, M. Mathieu, M. Jaderberg et al., AlphaStar: Mastering the Real-Time Strategy Game StarCraft II, p.97, 2019.

H. Vries, F. De, A. P. Strub, O. Sarath-chandar, H. Pietquin et al., GuessWhat?! Visual Object Discovery through Multimodal Dialogue, IEEE Conference on Computer Vision and Pattern Recognition, vol.43, p.29, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01549641

. Chapter, Continuous transfer in Deep Q-learning

W. Forgie, C. D. James, and . Forgie, Results Obtained from a Vowel Recognition Computer Program, The Journal of the Acoustical Society of America, 1959.

M. A. Walker, Informational redundancy and resource bounds in dialogue, The Institute for Research in Cognitive Science, 1993.

M. A. Walker, J. C. Fromer, and S. Narayanan, Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email, Annual Meeting of the Association for Computational Linguistics, 1998.

Y. Wang, R. J. Skerry-ryan, D. Stanton, Y. Wu, R. J. Weiss et al., Tacotron: Towards Endto-End Speech Synthesis, Conference of the International Speech Communication Association, 2017.

Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos et al., Sample efficient actor-critic with experience replay, p.43, 2016.

G. Weisz, P. Budzianowski, P. Su, and M. Gasic, Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces, Speech and Language Processing, 2018.

J. Weizenbaum, Eliza-a computer program for the study of natural language com-munication between man and machine, Communications of the Association for Computing Machinery, 1966.

. Wen, Y. Tsung--hsien, P. Miao, S. J. Blunsom, and . Young, Latent Intention Dialogue Models, International Conference on Machine Learning, 2017.

. Wen, M. Tsung-hsien, N. Gasic, P. Mrksic, D. Su et al., Semantically conditioned lstm-based natural language generation for spoken dialogue systems, p.29, 2015.

J. J. Weng, T. S. Narendra-ahuja, and . Huang, Learning recognition and segmentation of 3-D objects from 2-D images, International Conference on Computer Vision, 1993.

W. Wiesemann, D. Kuhn, and B. Rustem, Robust Markov Decision Processes, Mathematics of Operations Research, vol.70, 2013.

J. D. Williams, P. Poupart, and S. Young, Partially observable Markov decision processes with continuous observations for dialogue management, Recent Trends in Discourse and Dialogue, vol.42, 2008.

J. D. Williams, A. Raux, D. Ramachandran, and A. W. Black, The Dialog State Tracking Challenge, Conference of the Special Interest Group on Discourse and Dialogue, vol.29, 2013.

R. J. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning, p.43, 1992.

S. Worswick, , vol.22, 2005.

R. Yan, Chitty-Chitty-Chat Bot": Deep Learning for Conversational AI, International Joint Conference on Artificial Intelligence, vol.30, 2018.

Z. Yan, N. Duan, P. Chen, M. Zhou, J. Zhou et al., Building Task-Oriented Dialogue Systems for Online Shopping, Conference on Artificial Intelligence of the Association for the Advancement of Artificial Intelligence, 2017.

D. Yann, . Tur, L. Hakkani-tur, and . Heck, Zero-shot learning and clustering for semantic utterance classification using deep learning, International Conference on Learning Representations, 2014.

K. Yao, B. Peng, S. Zhang, D. Yu, G. Zweig et al., Spoken language understanding using long short-term memory neural networks, IEEE Spoken Language Technology Workshop, p.28, 2014.

K. Yao, G. Zweig, M. Hwang, Y. Shi, and D. Yu, Recurrent neural networks for language understanding, Conference of the International Speech Communication Association, 2013.

S. J. Young, M. Gasic, B. Thomson, and J. D. Williams, POMDP-Based Statistical Spoken Dialog Systems: A Review, p.21, 2013.

S. Young, M. Ga?i?, S. Keizer, F. Mairesse, J. Schatzmann et al., The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management, Computer Speech and Language, p.29, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00598186

J. Zhang, J. Tobias-springenberg, J. Boedecker, and W. Burgard, Deep reinforcement learning with successor features for navigation across similar environments, International Conference on Intelligent Robots and Systems, p.116, 2017.

. Chapter, Continuous transfer in Deep Q-learning

X. Zhang and H. Wang, A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding, International Joint Conference on Artificial Intelligence, 2016.

. Zue, S. Victor, J. Seneff, J. Glass, C. Polifroni et al., Jupiter: A Telephone-Based Conversational Interface for Weather Information, IEEE Transactions on Speech and Audio Processing, 2000.