. Chandramohan, Dialogua Act: askASlot-slot_3 Dialogue State Dialogua Act: explicitConfirm-slot_3 Dialogue State: [0.368238, 0, 1] Dialogua Act: implConfAskASlot-slot_1_ASK_slot_2 Dialogue State: [0.917882, 0.499996, 1] Dialogua Act: explicitConfirm-slot_2 Dialogue State: [0.958942, 0.5, 1] Dialogua Act: explicitConfirm-slot_2 Dialogue State: [0.958942, 1, 1] Dialogua Act: closingDialogue-null ========================================================= Intended goal of the simulated user: Type: Indian Price-range: Cheap Location: City-Centre User goal captured by the dialogue manager: Type: Indian Price-range: Cheap Location: City-Centre ========================================================= Dialogue statistics: ActualReward: 75 (for successful task completion) Dialogue length: 6, Dialogue State, 2010.

. Li, 2011b] outlines an exciting direction of work with logue/user policy is generated. The following are a set of dialogue episodes are generated from these retrieved policies: ============================================== Step 1: HC-DialogueManager vs Train-RL-User, 2009.

. Diaact, CloseDia UserResponse: hangUp =============================================== Co-adaptation in dialogue systems

%. Useract:-hangup, P. ===============================================-references-abbeel, and A. Ng, Apprenticeship learning via inverse reinforcement learning, Proc. of ICML, p.109, 2004.

H. Ai and D. Litman, Assessing user simulation for dialog systems using human judges and automatic evaluation measures, Proc. of the 46th meeting of the Association for Computational Linguistics, pp.622-629, 2008.
DOI : 10.1207/s15516709cog2805_8

J. Allen, Natural language understanding, p.12, 1995.

I. Altman and D. Taylor, Social penetration: The development of interpersonal relationships, p.134, 1973.

K. J. Astrom, Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, vol.10, issue.1, pp.174-205, 1965.
DOI : 10.1016/0022-247X(65)90154-X

J. L. Austin, How to Do Things with Words: Second Edition (William James Lectures), p.12, 1975.
DOI : 10.1093/acprof:oso/9780198245537.001.0001

L. Baxter and D. Braithwaite, Engaging Theories in Interpersonal Communication: Multiple Perspectives, p.134, 2008.
DOI : 10.4135/9781483329529

R. Bellman, Dynamic Programming, pp.28-30, 1957.

R. Bellman, R. Bellman, and S. Dreyfus, A markovian decision process Functional approximation and dynamic programming, Mathematical Tables and Other Aids to Computation, pp.679-684, 1957.

C. R. Berger, Uncertain Outcome Values in Predicted Relationships Uncertainty Reduction Theory Then and Now, Human Communication Research, vol.5, issue.1, pp.34-38, 1986.
DOI : 10.1111/j.1468-2958.1985.tb00062.x

A. Boularias, J. Kober, and J. Peters, Relative entropy inverse reinforcement learning, Journal of Machine Learning Research -Proceedings Track, vol.15, issue.109, pp.182-189, 2011.

S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

S. Chandramohan and O. Pietquin, User and Noise Adaptive Dialogue Management Using Hybrid System Actions, In Spoken Dialogue Systems for Ambient Environments Lecture Notes in Artificial Intelligence (LNAI), vol.6392, pp.13-24, 2010.
DOI : 10.1007/978-3-642-16202-2_2
URL : https://hal.archives-ouvertes.fr/hal-00552848

S. Chandramohan, M. Geist, and O. Pietquin, Optimizing Spoken Dialogue Management with Fitted Value Iteration, Proc. of InterSpeech, p.82, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00553184

S. Chandramohan, M. Geist, and O. Pietquin, Sparse Approximate Dynamic Programming for Dialog Management, Proc. of SIGDial, p.82, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00553180

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, User Simulation in Dialogue Systems using Inverse Reinforcement Learning, Proc. of Interspeech, p.122, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652446

S. Chandramohan, M. Geist, and O. Pietquin, Apprentissage par Renforcement Inverse pour la Simulation d'Utilisateurs dans les Systèmes de Dialogue, InSixì emes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, p.7, 2011.

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, Behavior Specific User Simulation in Spoken Dialogue Systems, Proc. of the IEEE ITG Conference on Speech Communication, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00749421

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, Clustering behaviors of Spoken Dialogue Systems users, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
DOI : 10.1109/ICASSP.2012.6289038
URL : https://hal.archives-ouvertes.fr/hal-00685009

G. Chung, S. Seneff, and C. Wang, Automatic acquisition of names using speak and spell mode in spoken dialogue systems, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, 2003.
DOI : 10.3115/1073445.1073450

L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin et al., Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system, Proc. of Interspeech, pp.1301-1304, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652194

R. Dearden, N. Friedman, and S. J. Russell, Bayesian Q-Learning Automatic evaluation of machine translation quality using n-gram co-occurrence statistics, Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI) Proc. of the Human Language Technology Conference (HLT), pp.761-768, 1998.

T. Dutoit, An introduction to text-to-speech synthesis, p.13, 1997.
DOI : 10.1007/978-94-011-5730-8

W. Eckert, E. Levin, and R. Pieraccini, User modeling for spoken dialogue system evaluation, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp.47-101, 1997.
DOI : 10.1109/ASRU.1997.658991

Y. Engel, S. Mannor, and R. Meir, The Kernel Recursive Least-Squares Algorithm, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2275-2285, 2004.
DOI : 10.1109/TSP.2004.830985

A. M. Farahmand and C. Szepesvári, Model selection in reinforcement learning, Machine Learning, pp.1-34, 2011.
DOI : 10.1007/s10994-011-5254-7

G. Ferguson, J. Allen, and B. Miller, Trains-95: Towards a mixedinitiative planning assistant, Proceedings of the 3rd Conference on AI Planning Systems, p.20, 1996.

M. Frampton and O. Lemon, Recent research advances in Reinforcement Learning in Spoken Dialogue Systems, The Knowledge Engineering Review, vol.16, issue.04, pp.375-408, 2009.
DOI : 10.1109/89.817450

M. Gasic, F. Jurcicek, B. Thomson, K. Yu, and S. Young, Online policy optimisation of spoken dialogue systems via live interaction with human subjects, Proc. of ASRU 2011, p.133, 2011.

M. Geist, Optimisation des chanes de production dans l'industrie sidérurgique : une approche statistique de l'apprentissage par renforcement, p.92, 2009.

M. Geist and O. Pietquin, A Brief Survey of Parametric Value Function Approximation, p.62, 2010.

M. Geist and O. Pietquin, Kalman Temporal Differences, Journal of Artificial Intelligence Research, vol.39, issue.62, pp.483-532, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00351297

M. Geist and O. Pietquin, Managing Uncertainty within Value Function Approximation in Reinforcement Learning, Active Learning and Experimental Design workshop, p.92, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00554398

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: Uncertainty and Value Function Approximation, NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00351298

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.185-192, 2009.
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870

K. Georgila, J. Henderson, and O. Lemon, Learning user simulations for information state update dialogue systems, Proc. Interspeech '05, p.48, 2005.

K. Georgila, J. Henderson, and O. Lemon, User simulation for spoken dialogue systems: Learning and evaluation, Proc. International Conference on Spoken Language Processing (Interspeech/ICSLP), p.74, 2006.

A. Geramifard, M. Bowling, and R. S. Sutton, Incremental leastsquares temporal difference learning, Proc. of AAAI, pp.356-361, 2006.

J. Gergonne, J. Glass, and S. Seneff, The application of the method of least squares to the interpolation of sequences, Proc. of the HLT-NAACL 2003 workshop on Research directions in dialogue processing, pp.439-447, 1974.
DOI : 10.1016/0315-0860(74)90034-2

J. Götze, T. Scheffler, R. Roller, and N. Reithinger, User simulation for the evaluation of bus information systems, 2010 IEEE Spoken Language Technology Workshop, p.46, 2010.
DOI : 10.1109/SLT.2010.5700895

S. Hahn, M. Dinarelli, C. Raymond, F. Lefèvre, P. Lehnen et al., Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages, Audio, Speech, and Language Processing, pp.1569-1583, 2011.
DOI : 10.1109/TASL.2010.2093520
URL : https://hal.archives-ouvertes.fr/hal-00746965

J. Henderson and O. Lemon, Mixture model POMDPs for efficient handling of uncertainty in dialogue management, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers, HLT '08, pp.73-76, 2008.
DOI : 10.3115/1557690.1557710

S. Janarthanam and O. Lemon, Learning Adaptive Referring Expression Generation Policies for Spoken Dialogue Systems using Reinforcement Learning, Proceedings SemDial'09, p.25, 2009.

F. Jelinek, Statistical Methods for Speech Recognition, p.22, 1998.

S. Jung, C. Lee, K. Kim, M. Jeong, and G. G. Lee, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech & Language, vol.23, issue.4, pp.479-509, 2009.
DOI : 10.1016/j.csl.2009.03.002

A. Jnsson, Dialogue management for natural language interfaces, Tech. rep., THE UNIVERSITY OF QUEENSLAND, 1993.

L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: a survey, Journal of Artificial Intelligence Research, vol.4, pp.237-285, 1996.

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

S. Keizer, M. Gasic, F. Jurcicek, F. Mairesse, B. Thomson et al., Parameter estimation for agenda-based user simulation, 2010.

S. Keizer, S. Rossignol, S. Chandramohan, and O. Pietquin, User Simulation in the Development of Statistical Spoken Dialogue Systems, Data driven methods for Adaptive Spoken Dialogue Systems, 2012.
DOI : 10.1007/978-1-4614-4803-7_4
URL : https://hal.archives-ouvertes.fr/hal-00771701

J. Z. Kolter and A. Y. Ng, Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.92, 2009.
DOI : 10.1145/1553374.1553441

S. Kullback and R. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol.22, issue.1, pp.79-86, 1951.
DOI : 10.1214/aoms/1177729694

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, issue.114, pp.1107-1149, 2003.

S. Larsson and D. Traum, Information state and dialogue management in the TRINDI dialogue move engine toolkit, Natural Language Engineering, vol.6, issue.3&4, pp.323-340, 2000.
DOI : 10.1017/S1351324900002539

A. Lee and M. Przybocki, NIST Machine translation evaluation official results. Official release of automatic evaluation scores for all submissions, p.54, 2005.

F. Lefèvre, Dynamic Bayesian Networks and Discriminative Classifiers for Multi-Stage Semantic Interpretation, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, p.22, 2007.
DOI : 10.1109/ICASSP.2007.367151

F. Lefèvre, R. De-mori, and O. Lemon, Unsupervised state clustering for stochastic dialog management Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation, Proc. of ASRU, pp.210-221, 2007.

O. Lemon and O. Pietquin, Machine learning for spoken dialogue systems, Proc. of InterSpeech'07, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00216035

O. Lemon, K. Georgila, J. Henderson, and M. Stuttle, An ISU dialogue system exhibiting reinforcement learning of dialogue policies, Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations on, EACL '06, p.76, 2006.
DOI : 10.3115/1608974.1608986

E. Levin and R. Pieraccini, Using Markov decision process for learning dialogue strategies, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), p.42, 1998.
DOI : 10.1109/ICASSP.1998.674402

E. Levin, R. Pieraccini, and W. Eckert, A stochastic model of human-machine interaction for learning dialog strategies, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, pp.11-23, 2000.
DOI : 10.1109/89.817450

L. Li, S. Balakrishnan, and J. Williams, Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection, Proc. of the International Conference on Speech Communication and Technologies (InterSpeech'09), p.82, 2009.

R. López-cózar, Z. Callejas, and M. F. Mctear, Testing the performance of spoken dialogue systems by means of an artificially simulated user, Artificial Intelligence Review, vol.12, issue.2, pp.291-323, 2006.
DOI : 10.1007/s10462-007-9059-9

N. Mehta, R. Gupta, A. Raux, D. Ramachandran, and S. Krawczyk, Probabilistic Ontology Trees for Belief Tracking in Dialog Systems, Proc. of the SIGDIAL 2010 Conference, pp.37-46, 2010.

A. Y. Ng, S. Russell, K. Papineni, S. Roukos, T. Ward et al., Algorithms for inverse reinforcement learning BLEU: A method for automatic evaluation of machine translation, Proc. of ICML Proc. of the 40th Annual Meeting on Association for Computational Linguistics (ACL), p.54, 2000.

J. Park and I. Sandberg, Universal Approximation Using Radial-Basis-Function Networks, Neural Computation, vol.2, issue.2, pp.246-257, 1991.
DOI : 10.1109/35.41401

O. Pietquin, A Probabilistic Description of Man-Machine Spoken Communication, 2005 IEEE International Conference on Multimedia and Expo, pp.410-413, 2005.
DOI : 10.1109/ICME.2005.1521447

O. Pietquin, Consistent goal-directed user model for realistic manmachine task-oriented spoken dialogue simulation, Proc. of ICME, pp.425-428, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00215968

O. Pietquin and T. Dutoit, A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006.
DOI : 10.1109/TSA.2005.855836
URL : https://hal.archives-ouvertes.fr/hal-00207952

O. Pietquin and H. Hastie, A survey on metrics for the evaluation of user simulations. The Knowledge Engineering Review, p.116, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00771654

O. Pietquin, M. Geist, and S. Chandramohan, Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences, Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00618252

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, pp.1-7, 2011.
DOI : 10.1145/1966407.1966412
URL : https://hal.archives-ouvertes.fr/hal-00617517

O. Pietquin, S. Rossignol, and M. Ianotto, Training Bayesian networks for realistic man-machine spoken dialogue simulation, Proc. of IWSDS, p.49, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00448636

F. Pinault and F. Lefèvre, Semantic graph clustering for pomdp-based spoken dialog systems, Proc. of Interspeech, pp.1321-1324, 2011.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, pp.31-34, 1994.
DOI : 10.1002/9780470316887

L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, p.12, 1993.

E. Reiter and R. Dale, Building natural language generation systems, p.13, 2000.
DOI : 10.1017/CBO9780511519857
URL : http://arxiv.org/abs/cmp-lg/9605002

V. Rieser, Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data, p.102, 2008.

V. Rieser and O. Lemon, Simulations for learning dialogue strategies, Proc. of Interspeech, p.54, 2006.

V. Rieser and O. Lemon, Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation, Theory and Applications of Natural Language Processing, p.132, 2011.
DOI : 10.1007/978-3-642-24942-6

N. Roy, J. Pineau, and S. Thrun, Spoken dialogue management using probabilistic reasoning, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics , ACL '00, p.13, 2000.
DOI : 10.3115/1075218.1075231
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.8204

Y. Sakaguchi and M. Takano, Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function, Neural Networks, vol.17, issue.7, pp.935-952, 2004.
DOI : 10.1016/j.neunet.2004.05.004

A. L. Samuel, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959.

J. Schatzmann, M. N. Stuttle, K. Weilhammer, and S. Young, Effects of the user model on simulation-based learning of dialogue strategies, 2005.

R. Schatzmann, J. Weilhammer, K. Stuttle, M. Young, and S. , A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, Proc. of ASRU'05, pp.97-126, 2006.
DOI : 10.1017/S0269888906000944

J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young, A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, The Knowledge Engineering Review, vol.21, issue.02, pp.97-126, 2006.
DOI : 10.1017/S0269888906000944

J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young, Agenda-based user simulation for bootstrapping a POMDP dialogue system, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, p.102, 2007.
DOI : 10.3115/1614108.1614146

S. Singh, M. Kearns, D. Litman, and M. Walker, Reinforcement learning for spoken dialogue systems, Proc. of NIPS, p.42, 1999.

E. J. Sondik, The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs, Operations Research, vol.26, issue.2, pp.282-304, 1978.
DOI : 10.1287/opre.26.2.282

A. L. Strehl and M. L. Littman, An analysis of model-based Interval Estimation for Markov Decision Processes, Journal of Computer and System Sciences, vol.74, issue.8, p.91, 2006.
DOI : 10.1016/j.jcss.2007.08.009

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, pp.35-36, 1998.
DOI : 10.1109/TNN.1998.712192

M. E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

M. Theune, Natural language generation for dialogue: system survey, p.25, 2003.

C. J. Van-rijsbergen, Information Retrieval, Butterworths, p.52, 1979.

K. Vanlehn, P. Jordan, and D. Litman, D.: Developing pedagogically effective tutorial dialogue tactics: Experiments and a testbed, Proc. of SLaTE Workshop on Speech and Language Technology in Education, 2007.

M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella, PAR- ADISE: A framework for evaluating spoken dialogue agents, Proc. of the 35th Annual Meeting of the Association for Computational Linguistics (ACL'97, pp.271-280, 1997.

C. J. Watkins and P. Dayan, Q-learning, Machine Learning, pp.272-292, 1992.

Y. Wilks, Artificial companions, Proceedings of the 1st International Workshop on Machine Learning for Multimodal Interaction, p.19, 2004.
DOI : 10.1179/030801805X25945

J. D. Williams and S. Young, Partially observable Markov decision processes for spoken dialog systems, Computer Speech & Language, vol.21, issue.2, pp.393-422, 2007.
DOI : 10.1016/j.csl.2006.06.008

X. Xu, D. Hu, and X. Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007.
DOI : 10.1109/TNN.2007.899161

I. Zukerman and D. Albrecht, Predictive statistical models for user modeling, User Modeling and User-Adapted Interaction, vol.11, issue.1/2, pp.5-18, 2001.
DOI : 10.1023/A:1011175525451