Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Conference On Learning Theory (COLT 06), pp.574-588, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning, éditeurs : Advances in Neural Information Processing Systems (NIPS 19), pp.49-56, 2007. ,
Advantage Updating Rapport technique WL-TR-93-1146, 1993. ,
Residual Algorithms : Reinforcement Learning with Function Approximation, Proceedings of the International Conference on Machine Learning (ICML 95), pp.30-37, 1995. ,
A Markovian Decision Process, Indiana University Mathematics Journal, vol.6, issue.4, pp.679-684, 1957. ,
DOI : 10.1512/iumj.1957.6.56038
Dynamic Programming, 1957. ,
Polynomial approximation -a new computational technique in dynamic programming : allocation processes, Mathematical Computation, vol.17, pp.155-161, 1973. ,
The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, pp.819-840, 2002. ,
DOI : 10.1287/moor.27.4.819.297
Bertsekas : Dynamic Programming and Optimal Control, Athena Scientific, 1995. ,
Tsitsiklis : Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), Athena Scientific, 1996. ,
Incremental natural actor-critic algorithms Natural Actor-Critic Algorithms, Conference on Neural Information Processing Systems (NIPS), 2007. ,
Neural Networks for Pattern Recognition, 1995. ,
Controlled diffusion processes, Probability Surveys, pp.213-244, 2005. ,
Boyan : Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning (ICML 99), pp.49-56 ,
Technical Update : Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999. ,
Barto : Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
R-max -A general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2002. ,
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, pp.207-239, 2006. ,
Model-Based Bayesian Exploration, Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pp.150-165, 1999. ,
Bayesian Q-Learning, Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI), pp.761-768, 1998. ,
Chi-square Tests Driven Method for Learning the Structure of Factored MDPs, Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 06), pp.122-129, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-01351133
Estimates of Parameter Distributions for Optimal Action Selection Rapport technique IDIAP-RR 04-72, Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), 2005. ,
Optimal learning : Computational preocedures for Bayesadaptive Markov decision processes, Thèse de doctorat, 2002. ,
Algorithms and Representations for Reinforcement Learning, Thèse de doctorat, 2005. ,
Bayes Meets Bellman : The Gaussian Process Approach to Temporal Difference Learning, Proceedings of the International Conference on Machine Learning (ICML 03), pp.154-161, 2003. ,
The Kernel Recursive Least-Squares Algorithm, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2275-2285, 2004. ,
DOI : 10.1109/TSP.2004.830985
Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005. ,
DOI : 10.1145/1102351.1102377
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Neural network training with the nprKF, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), pp.109-114, 2001. ,
DOI : 10.1109/IJCNN.2001.939001
Puskorius : A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification, Proceedings of the IEEE, pp.2259-2277, 1998. ,
Metrics for Finite Markov Decision Processes, Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 04), pp.162-178, 2004. ,
Metrics for Markov Decision Processes with Infinite State Spaces, Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI 05), p.201, 2005. ,
Methods for computing state similarity in Markov Decision Processes, Proceedings of the 22nd Conference on Uncertainty in Artificial intelligence (UAI 06), 2006. ,
A Sparse Nonlinear Bayesian Online Kernel Regression, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences, pp.199-204, 2008. ,
DOI : 10.1109/ADVCOMP.2008.7
URL : https://hal.archives-ouvertes.fr/hal-00327081
Bayesian Reward Filtering, S. Girgin et al., éditeur : Proceedings of the European Workshop on Reinforcement Learning, pp.96-109, 2008. ,
DOI : 10.1007/978-3-540-89722-4_8
URL : https://hal.archives-ouvertes.fr/hal-00351282
Filtrage bayésien de la récompense, actes des Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, pp.113-122, 2008. ,
Kalman Temporal Differences : Uncertainty and Value Function Approximation, NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00351298
Online Bayesian kernel regression from nonlinear mapping of observations, 2008 IEEE Workshop on Machine Learning for Signal Processing, 2008. ,
DOI : 10.1109/MLSP.2008.4685498
URL : https://hal.archives-ouvertes.fr/hal-00335052
Différences Temporelles de Kalman Décision et Apprentissage pour la conduite de systèmes, actes des Journées Francophones de Planification, 2009. ,
Différences Temporelles de Kalman : le cas stochastique, actes des Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2009. ,
From Supervised to Reinforcement Learning : a Kernel-based Bayesian Filtering Framework, International Journal On Advances in Software, vol.2, issue.1, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00429891
Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870
Kernelizing Vector Quantization Algorithms, European Symposium on Artificial Neural Networks, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00429892
Tracking in Reinforcement Learning, Proceedings of the 16th International Conference on Neural Information Processing, 2009. ,
DOI : 10.1007/978-3-642-10677-4_57
URL : https://hal.archives-ouvertes.fr/hal-00439316
Astuce du Noyau & Quantification Vectorielle, Actes du 17ème colloque sur la Reconnaissance des Formes et l'Intelligence Artificielle (RFIA'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553114
Incremental Least- Squares Temporal Difference Learning, 21st Conference of American Association for Artificial Intelligence (AAAI 06), pp.356-361, 2006. ,
Learning dynamic Bayesian networks, Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures, pp.168-197, 1998. ,
DOI : 10.1007/BFb0053999
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.462.2249
Stable Function Approximation in Dynamic Programming, Proceedings of the International Conference on Machine Learning (IMCL 95), 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
Kalman Filtering : Theory and Practice, 1993. ,
Adaptive importance sampling for value function approximation in off-policy reinforcement learning, Neural Networks, vol.22, issue.10, 2009. ,
DOI : 10.1016/j.neunet.2009.01.002
Dynamic Programming and Markov Processes, p.3, 1960. ,
Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix, IEEE Transactions on Signal Processing, vol.53, issue.6, pp.2112-2123, 2005. ,
Uhlmann : Unscented filtering and nonlinear estimation, Proceedings of the IEEE, pp.401-422, 2004. ,
The scaled unscented transformation, American Control Conference, pp.4555-4559, 2002. ,
Uhlmann : A new extension of the Kalman filter to nonlinear systems, Int. Symp. Aerospace/Defense Sensing, Simul. and Controls 3, 1997. ,
Feature Selection for Value Function Approximation Using Bayesian Model Selection, Proceedings of European Conference on Machine Learning, 2009. ,
DOI : 10.1109/5.58326
Learning in embedded systems, 1993. ,
Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998. ,
DOI : 10.1016/S0004-3702(98)00023-X
Exploration in Metric State Spaces, International Conference on Machine Learning (ICML 03), pp.306-312, 2003. ,
Near-Optimal Reinforcement Learning in Polynomial Time, Machine Learning, pp.209-232, 2002. ,
Automatic basis function construction for approximate dynamic programming and reinforcement learning, Proceedings of the 23rd international conference on Machine learning (ICML 06), pp.449-456, 2006. ,
Time-varying parameter models with endogenous regressors, Economics Letters, vol.91, issue.1, pp.21-26, 2006. ,
DOI : 10.1016/j.econlet.2005.10.007
An Analysis of Actor-Critic Algorithms Using Eligibility Traces : Reinforcement Learning with Imperfect Value Function, Proceedings of the Fifteenth International Conference on Machine Learning (ICML 98), pp.278-286, 1998. ,
Ng : Near-Bayesian Exploration in Polynomial Time, Proceedings of the 26th international conference on Machine learning (ICML 09), 2009. ,
Actor-Critic Algorithms, Advances in Neural Information Processing Systems (NIPS 12, 2000. ,
Tsitsiklis : On Actor-Critic Algorithms, SIAM Journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Efficient Exploration With Latent Structure, Robotics: Science and Systems I, 2005. ,
DOI : 10.15607/RSS.2005.I.011
Online exploration in least-squares policy iteration, Proceedings of the Conference for research in autonomous agents and multi-agent systems (AAMAS-09), 2009. ,
The Witness Algorithm : Solving Partially Observable Markov Decision Processes, 1994. ,
Efficient dynamic-programming updates in partially observable markov decision processes . Rapport technique CS-95-19, 1995. ,
Proto-value Function : A Laplacian Framework for Learning Representation and Control in Markov Decisions, 2006. ,
Utilizing the Natural Gradient in Temporal Difference Reinforcement Learning with Eligibility Traces, 2nd Internatinal Symposium on Information Geometry and its Applications, pp.256-263, 2005. ,
Kernel-Based Reinforcement Learning, Machine Learning, pp.161-178, 2002. ,
Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Natural Actor-Critic, éditeur : Proceedings of the European Conference on Machine Learning (ECML 2005), Lecture Notes in Artificial Intelligence ,
Reinforcement Learning for Humanoid Robotics, third ieee-ras international conference on humanoid robots, 2003. ,
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the International Conference on Machine Learning (ICML 07), 2007. ,
An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.697-704, 2006. ,
DOI : 10.1145/1143844.1143932
Eligibility Traces for Off- Policy Policy Evaluation, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00), pp.759-766, 2000. ,
Feature discovery in approximate dynamic programming, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927533
URL : https://hal.archives-ouvertes.fr/hal-00351144
Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.279-297, 1994. ,
DOI : 10.1109/72.279191
Roles of learning rates, artificial process noise and square rootfiltering for extended Kalman filter training, Proceedings of the International Joint Conference on Neural Networks (?CNN 99, pp.1809-1814, 1999. ,
Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994. ,
Gaussian Processes for Machine Learning, 2006. ,
A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models, Neurocomputing, vol.20, issue.1-3, pp.279-294, 1998. ,
DOI : 10.1016/S0925-2312(98)00021-6
URL : https://hal.archives-ouvertes.fr/hal-00797391
Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function, Neural Networks, vol.17, issue.7, pp.935-952, 2004. ,
DOI : 10.1016/j.neunet.2004.05.004
On the bias of batch Bellman residual minimisation, Neurocomputing, vol.72, issue.7-9, 2005. ,
DOI : 10.1016/j.neucom.2008.11.024
Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, Neural Information Processing Systems(NIPS 15), 2002. ,
Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond, 2001. ,
A Neural Substrate of Prediction and Reward, Science, vol.275, issue.5306, pp.1593-1599, 1997. ,
DOI : 10.1126/science.275.5306.1593
Processus décisionnels de Markov en intelligence artificielle, 2008. ,
Optimal State Estimation : Kalman, H Infinity, and Nonlinear Approaches, 2006. ,
DOI : 10.1002/0470045345
Approximate solutions of the nonlinear filtering problem, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications, pp.620-625, 1977. ,
DOI : 10.1109/CDC.1977.271646
Littman : PAC Model-Free Reinforcement Learning, 23rd International Conference on Machine Learning, pp.881-888, 2006. ,
DOI : 10.1145/1143844.1143955
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.326
Littman : An Empirical Evaluation of Interval Estimation for Markov Decision Processes, 16th IEEE International on Tools with Artificial Intelligence Conference, pp.128-135, 2004. ,
Littman : An Analysis of Model-Based Interval Estimation for Markov Decision Processes, Journal of Computer and System Sciences, 2006. ,
A Bayesian Framework for Reinforcement Learning, Proceedings of the 17th International Conference on Machine Learning, pp.943-950 ,
Reinforcement Learning : An Introduction (Adaptive Computation and Machine Learning), 1998. ,
On the role of tracking in stationary environments, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.871-878, 2007. ,
DOI : 10.1145/1273496.1273606
Singh et Yishay Mansour : Policy Gradient Methods for Reinforcement Learning with Function Approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999. ,
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, pp.181-211, 1999. ,
DOI : 10.1016/S0004-3702(99)00052-1
Adaptive Classification by Variational Kalman Filtering, Neural Information Processing Systems (NIPS 15), 2002. ,
Instrumental variable methods for system identification, Circuits, Systems, and Signal Processing, pp.1-9, 2002. ,
DOI : 10.1007/BFb0009019
Temporal difference learning and TD-Gammon, Mars, 1995. ,
DOI : 10.1145/203330.203343
On the likelihood that one unknown probability exceeds another in view of two samples, Biometrika, issue.25, pp.285-294, 1933. ,
Educational psychology : the psychology of learning, 1913. ,
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, pp.674-690, 1997. ,
A semiparametric statistical approach to model-free policy evaluation, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390291
Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, Thèse de doctorat, 2004. ,
The Unscented Particle Filter. Rapport technique CUED, 2000. ,
Statisical Learning Theory, 1998. ,
The unscented Kalman filter for nonlinear estimation, Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE, pp.153-158, 2000. ,
Bayesian sparse sampling for on-line reward optimization, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.956-963, 2005. ,
DOI : 10.1145/1102351.1102472
Learning from Delayed Rewards, Thèse de doctorat, 1989. ,
Efficient model-based exploration, Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5, pp.223-228, 1998. ,
The QV family compared to other reinforcement learning algorithms, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927532
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992. ,
Basis function adaptation methods for cost approximation in MDP, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927528
Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, vol.10, issue.1, pp.174-205, 1965. ,
DOI : 10.1016/0022-247X(65)90154-X