Mohex wins hex tournament, ICGA journal, pp.114-116, 2009. ,
DOI : 10.3233/icg-2009-32218
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.2612
Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, vol.10, issue.1, pp.174-205, 1965. ,
DOI : 10.1016/0022-247X(65)90154-X
Minimax policies for adversarial and stochastic bandits, proceedings of the Annual Conference on Learning Theory (COLT), 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Use of variance estimation in the multi-armed bandit problem, NIPS 2006 Workshop on On-line Trading of Exploration and Exploitation, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00203496
Best Arm Identification in Multi- Armed Bandits, COLT 2010 -Proceedings, page 13 p, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
Grid Coevolution for Adaptive Simulations: Application to the Building of Opening Books in the Game of Go, Proceedings of EvoGames, pp.323-332, 2009. ,
DOI : 10.1007/978-3-642-01129-0_36
URL : https://hal.archives-ouvertes.fr/inria-00369783
Using confidence bounds for exploitation-exploration trade-offs, The Journal of Machine Learning Research, vol.3, pp.397-422, 2003. ,
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995. ,
DOI : 10.1109/SFCS.1995.492488
Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, 2002. ,
Improved Rates for the Stochastic Continuum-Armed Bandit Problem, Lecture Notes in Computer Science, vol.4539, pp.454-468, 2007. ,
DOI : 10.1007/978-3-540-72927-3_33
Continuous Upper Confidence Trees with Polynomial Exploration ??? Consistency, ECML/PKKD 2013, 2013. ,
DOI : 10.1007/978-3-642-40988-2_13
URL : https://hal.archives-ouvertes.fr/hal-00835352
The *-minimax search procedure for trees containing chance nodes, Artif. Intell, vol.21, issue.3, pp.327-350, 1983. ,
A genetic search in policy space for solving markov decision processes, AAAI Spring Symposium on Search Techniques for Problem Solvingunder Uncertainty and Incomplete Information, 1999. ,
Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions, issue.5, pp.13834-846, 1983. ,
DOI : 10.1109/tsmc.1983.6313077
Dynamic Programming, 1957. ,
Partitioning procedures for solving mixed-variables programming problems, Numerische Mathematik, vol.38, issue.1, pp.238-252, 1962. ,
DOI : 10.1007/BF01386316
Using a Financial Training Criterion Rather than a Prediction Criterion, CIRANO Working Papers 98s-21, 1998. ,
DOI : 10.1142/S0129065797000422
Dynamic Programming and Optimal Control, vols I and II, Athena Scientific, 1995. ,
Adaptive Robust Optimization for the Security Constrained Unit Commitment Problem, IEEE Transactions on Power Systems, vol.28, issue.1, pp.52-63, 2013. ,
DOI : 10.1109/TPWRS.2012.2205021
Olivier Teytaud, and Paul Vayssì ere. Parameter Tuning by Simple Regret Algorithms and Multiple Simultaneous Hypothesis Testing, ICINCO2010, p.10, 2010. ,
Move-Pruning Techniques for Monte-Carlo Go, Advances in Computer Games 11, 2005. ,
DOI : 10.1007/11922155_8
Technical update: Least-squares temporal difference learning, Machine Learning, pp.233-246, 2002. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Pure exploration in finitely-armed and continuous-armed bandits, Theoretical Computer Science, vol.412, issue.19, pp.1832-1852, 2011. ,
DOI : 10.1016/j.tcs.2010.12.059
URL : https://hal.archives-ouvertes.fr/hal-00609550
Online optimization in x-armed bandits, NIPS, pp.201-208, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00329797
X-Armed Bandits, Journal of Machine Learning Research, vol.12, pp.1655-1695, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00450235
Optimistic Heuristics for MineSweeper, ICS -International Computer Symposium - 2012 of Smart Innovation, Systems and Technologies, pp.199-207, 2012. ,
DOI : 10.1007/978-3-642-35452-6_22
URL : https://hal.archives-ouvertes.fr/hal-00750577
Reinforcement learning and dynamic programming using function approximators, CRC Pr I Llc, vol.39, 2010. ,
DOI : 10.1201/9781439821091
URL : http://orbi.ulg.ac.be/jspui/handle/2268/27963
Golois wins phantom go tournament, ICGA Journal, vol.30, issue.3, pp.165-166, 2007. ,
DOI : 10.3233/icg-2009-32110
On the parallelization of UCT, Proceedings of CGW07, pp.93-101, 2007. ,
A Phantom-Go Program, ACG, pp.120-125, 2006. ,
DOI : 10.1007/11922155_9
Progressive Strategies for Monte-Carlo Tree Search, Proceedings of the 10th Joint Conference on Information Sciences, pp.655-661, 2007. ,
Parallel Monte-Carlo Tree Search, Proceedings of the Conference on Computers and Games, 2008. ,
DOI : 10.1007/978-3-540-87608-3_6
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.159.4373
Montecarlo tree search: A new framework for game ai, 2008. ,
Transpositions and move groups in Monte Carlo tree search, 2008. ,
Strategic Choices: Small Budgets and Simple Regret, 2012 Conference on Technologies and Applications of Artificial Intelligence, 2012. ,
DOI : 10.1109/TAAI.2012.35
URL : https://hal.archives-ouvertes.fr/hal-00753145
Improving the Exploration in Upper Confidence Trees, Learning and Intelligent OptimizatioN Conference LION 6, 2012. ,
DOI : 10.1007/978-3-642-34413-8_29
URL : https://hal.archives-ouvertes.fr/hal-00745208
Continuous Upper Confidence Trees, LION'11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, p.page TBA, 2011. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://hal.archives-ouvertes.fr/hal-00835352
Continuous Rapid Action Value Estimates, The 3rd Asian Conference on Machine Learning (ACML2011) Conference Proceedings, pp.19-31, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00642459
Consistent Belief State Estimation, with Application to Mines, 2011 International Conference on Technologies and Applications of Artificial Intelligence, 2011. ,
DOI : 10.1109/TAAI.2011.55
URL : https://hal.archives-ouvertes.fr/hal-00712388
Learning a movegenerator for upper confidence trees, Advances in Intelligent Systems and Applications - of Smart Innovation, Systems and Technologies, pp.209-218, 2013. ,
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computers and Games, 2006. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computers and Games, 2006. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Computing elo ratings of move patterns in the game of go, Computer Games Workshop, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00149859
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th international conference on Computers and games, CG'06, pp.72-83, 2007. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Rollout sampling approximate policy iteration, Machine Learning, pp.157-171, 2008. ,
DOI : 10.1007/978-3-540-87479-9_6
URL : http://arxiv.org/abs/0805.2027
Move Ordering vs Heavy Playouts: Where Should Heuristics be Applied in Monte Carlo Go, Proc. 3rd North Amer ,
Clinical data based optimal STI strategies for HIV: a reinforcement learning approach, Proceedings of the 45th IEEE Conference on Decision and Control, pp.65-72, 2006. ,
DOI : 10.1109/CDC.2006.377527
URL : https://hal.archives-ouvertes.fr/hal-00121732
Simulation-based approach to general game playing, AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence, pp.259-264, 2008. ,
Model predictive control and the optimization of power plant load while considering lifetime consumption, IEEE Transactions on, vol.17, issue.1, pp.186-191, 2002. ,
The parallelization of Monte-Carlo planning, Proceedings of the International Conference on Informatics in Control, Automation and Robotics, pp.198-203, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00287867
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, vol.175, issue.11, pp.1856-1875, 2011. ,
DOI : 10.1016/j.artint.2011.03.007
Exploration exploitation in go: Uct for montecarlo go, NIPS-2006, Online trading between exploration and exploitation Workshop, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00115330
Modification of uct with patterns in monte-carlo go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Mach. Learn, vol.65, issue.1, pp.167-198, 2006. ,
Amedeo Cesta, and Ioannis Refanidis, Proceedings of the 19th International Conference on Automated Planning and Scheduling, ICAPS 2009, 2009. ,
Markov Chain Monte Carlo in Practice, 1995. ,
Efficient nonlinear control through neuroevolution, Proceedings of the European Conference on Machine Learning, pp.654-662, 2006. ,
A sublinear-time randomized approximation algorithm for matrix games, Operations Research Letters, vol.18, issue.2, pp.53-58, 1995. ,
A survey of actorcritic reinforcement learning: Standard and natural policy gradients. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, issue.6, pp.421291-1307, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00756747
Error estimation and adaptive discretization for the discrete stochastic Hamilton?Jacobi?Bellman equation, Numerische Mathematik, vol.1, issue.1, pp.85-112, 2004. ,
DOI : 10.1007/s00211-004-0555-4
Multi-armed bandits, dynamic environments and meta-bandits, NIPS Workshop " online trading of exploration and exploitation, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00113668
Search in trees with chance nodes, 2004. ,
Metareasoning for monte carlo tree search, 2011. ,
Fixed Point Theory: An Introduction, 2002. ,
DOI : 10.1007/978-94-009-8177-5
Meta Monte-Carlo Tree Search for Automatic Opening Book Generation, pp.7-12 ,
Parallel Monte-Carlo Tree Search with Simulation Servers, 2010 International Conference on Technologies and Applications of Artificial Intelligence, 2008. ,
DOI : 10.1109/TAAI.2010.83
Nearly tight bounds for the continuum-armed bandit problem, NIPS, 2004. ,
Monte-Carlo Opening Books for Amazons, Computers and Games, pp.124-135, 2011. ,
DOI : 10.1007/978-3-642-17928-0_12
Bandit Based Monte-Carlo Planning, 15th European Conference on Machine Learning (ECML), pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
Improved monte-carlo search. working paper, 2006. ,
Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.521-528, 2009. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
Monte carlo *-minimax search. CoRR, abs/1304, 2013. ,
The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments, IEEE Transactions on Computational Intelligence and AI in games, 2009. ,
Minesweeper: Where to probe? ,
URL : https://hal.archives-ouvertes.fr/hal-00723550
Meta-learning of exploration/exploitation strategies: The multi-armed bandit case. CoRR, abs/1207, 2012. ,
The cross entropy method for fast policy search, International Conference on Machine Learning, pp.512-519, 2003. ,
Sample-based planning for continuous action markov decision processes, ICAPS. AAAI, 2011. ,
Approximate gradient methods in policyspace optimization of markov reward processes. Discrete Event Dynamic Systems, pp.111-148, 2003. ,
Renewable Energy Futures: Targets, Scenarios, and Pathways, Annual Review of Environment and Resources, vol.32, issue.1, pp.205-239, 2007. ,
DOI : 10.1146/annurev.energy.32.080106.133554
URL : http://epub.wupperinst.org/frontdoor/index/index/docId/2781
Les Réserves et la Régulation de l'Avenir dans la vie Economique, 1946. ,
Self-Adaptation in Evolutionary Algorithms, Parameter Setting in Evolutionary Algorithms, 2007. ,
DOI : 10.1007/978-3-540-69432-8_3
Game-playing and game-learning automata Advances in Programming and Non-numerical Computation, pp.183-196, 1966. ,
Policy gradient in continuous time, Journal of Machine Learning Research, vol.7, pp.771-791, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117152
Variable resolution discretization in optimal control, Machine Learning, vol.49, issue.2/3, pp.291-323, 2002. ,
DOI : 10.1023/A:1017992615625
Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, pp.79-110, 2003. ,
Stochastic Optimization of a Multireservoir Hydroelectric System: A Decomposition Approach, Water Resources Research, vol.18, issue.4, pp.779-792, 1985. ,
DOI : 10.1029/WR021i006p00779
Multi-stage stochastic optimization applied to energy planning, Mathematical Programming, vol.4, issue.1-3, pp.359-375, 1991. ,
DOI : 10.1007/BF01582895
Natural actor-critic, Neurocomputing, pp.71-78, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics), 2007. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
A survey of industrial model predictive control technology, Control Engineering Practice, vol.11, issue.7, 2003. ,
DOI : 10.1016/S0967-0661(02)00186-7
Sample mean based index policies with o(logn) regret for the multiarmed bandit problem, Advances in Applied Probability, 1995. ,
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. ,
DOI : 10.1109/ADPRL.2007.368196
Multiple Overlapping Tiles for Contextual Monte Carlo Tree Search, Evostar ,
DOI : 10.1007/978-3-642-12239-2_21
URL : https://hal.archives-ouvertes.fr/inria-00456422
Biasing Monte-Carlo Simulations through RAVE Values, The International Conference on Computers and Games, 2010. ,
DOI : 10.1007/978-3-642-17928-0_6
URL : https://hal.archives-ouvertes.fr/inria-00485555
Optimal active learning through billiards and upper confidence trees in continous domains, Proceedings of the ECML conference, 2009. ,
Optimal robust expensive optimization is tractable, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO '09, 2009. ,
DOI : 10.1145/1569901.1570255
URL : https://hal.archives-ouvertes.fr/inria-00374910
On-line q-learning using connectionist systems, 1994. ,
Monte-Carlo Search Techniques in the Modern Board Game Thurn and Taxis, 2010. ,
Combining Myopic Optimization and Tree Search: Application to MineSweeper, LION6, Learning and Intelligent Optimization, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00712417
Combining Myopic Optimization and Tree Search: Application to MineSweeper, LION6, Learning and Intelligent Optimization Proc. LION 6, pp.222-236, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00712417
Monte-Carlo simulation balancing, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.945-952, 2009. ,
DOI : 10.1145/1553374.1553495
Convergence results for single-step on-policy reinforcement-learning algorithms, Machine Learning, pp.287-308, 2000. ,
Stochastic Dynamic Programming for Long Term Hydrothermal Scheduling Considering Different Streamflow Models, 2006 International Conference on Probabilistic Methods Applied to Power Systems, pp.1-6, 2006. ,
DOI : 10.1109/PMAPS.2006.360203
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, pp.1-103, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Creating an Upper-Confidence-Tree Program for Havannah, ACG 12, 2009. ,
DOI : 10.1007/978-3-642-12993-3_7
URL : https://hal.archives-ouvertes.fr/inria-00380539
On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, 2010. ,
DOI : 10.1109/ITW.2010.5593334
URL : https://hal.archives-ouvertes.fr/inria-00495078
Including ontologies in monte-carlo tree search and applications -an open source platform, 2008. ,
On the use of low discrepancy sequences in Monte Carlo methods, Monte Carlo Methods and Applications, vol.2, issue.4, 1996. ,
DOI : 10.1515/mcma.1996.2.4.295
Fuzzy Q-Learning with an adaptive representation, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence), pp.720-725, 2008. ,
DOI : 10.1109/FUZZY.2008.4630449
Algorithms for infinitely many-armed bandits, Advances in Neural Information Processing Systems, 2008. ,
Modifications of UCT and sequence-like simulations for Monte-Carlo Go, 2007 IEEE Symposium on Computational Intelligence and Games, pp.175-182, 2007. ,
DOI : 10.1109/CIG.2007.368095
Bandit-based planning and learning in continuous-action markov decision processes, ICAPS. AAAI, 2012. ,
Evolutionary function approximation for reinforcement learning, Journal of Machine Learning Research, vol.7, pp.877-917, 2006. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992. ,
Modark wins chinese dark chess tournament, ICGA Journal, vol.33, issue.4, pp.230-231, 2010. ,
Basis function adaptation methods for cost approximation in mdp, " to appear, the proceedings of 2009 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2009. ,