S. Behnke, Online trajectory generation for omnidirectional biped walking, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., 2006.
DOI : 10.1109/ROBOT.2006.1641935

A. Bernstein and N. Shimkin, Adaptive-resolution reinforcement learning with??polynomial exploration in deterministic domains, Machine Learning, pp.359397-359407, 1999.
DOI : 10.1115/1.3426922

D. P. Bertsekas, Lambda-Policy Iteration: A Review and a New Implementation, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, 2013.
DOI : 10.1109/TAC.2009.2022097
URL : http://arxiv.org/pdf/1507.01029

. Borchani, . Hanen, . Varando, . Gherardo, C. Bielza et al., A survey on multi-output regression, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.33, issue.5, 2015.
DOI : 10.18637/jss.v033.i01
URL : http://oa.upm.es/40804/1/INVE_MEM_2015_204213.pdf

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classication and Regression Trees, 1984.

L. Breiman, Bagging predictors, Machine Learning, p.123140, 1996.
DOI : 10.2307/1403680

E. Brochu, C. Vlad, M. December, and F. , A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, 2010.

D. Boer, P. Tjerk, D. P. Kroese, . Mannor, . Shie et al., A tutorial on the cross-entropy method, Annals of Operations Research, vol.134, issue.1, 2005.

M. P. Deisenroth, C. Rasmussen, and . Edward, PILCO: A Model- Based and Data-Ecient Approach to Policy Search, Icml, p.465472, 2011.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, issue.1, p.503556, 2005.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine Learning, pp.10-1007, 2006.
DOI : 10.1007/s10994-006-6226-1
URL : https://hal.archives-ouvertes.fr/hal-00341932

J. Gittins, . Gittins, . Jc, . Jones, . Dm et al., A dynamic allocation index for the discounted multiarmed bandit problem, Biometrika, vol.66, issue.3, pp.561565-561575, 1979.
DOI : 10.1093/biomet/66.3.561

N. Hansen, S. D. Müller, and P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 2003.

N. Hansen and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159195-159205, 2001.
DOI : 10.1016/0004-3702(95)00124-7
URL : http://www.mitpressjournals.org/userimages/ContentEditor/1164817256746/lib_rec_form.pdf

V. Heidrich-meisner and C. Igel, Uncertainty handling CMA-ES for reinforcement learning, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO '09, pp.1211-1221, 2009.
DOI : 10.1145/1569901.1570064

L. Hofer and H. Gimbert, Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes, PlanRob2016, Proceedings of the 4th Workshop on Planning and Robotics at ICAPS2016. AAAI, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01416179

L. Hofer and Q. Rouxel, An Operational Method Toward Ecient Walk Control Policies for Humanoid Robots, ICAPS 2017, 2017.

M. Hoffman, E. Brochu, N. Freitas, and . De, Portfolio Allocation for Bayesian Optimization, Conference on Uncertainty in Articial Intelligence, p.327336, 2011.

B. Ijspeert, A. Jan, . Nakanishi, . Jun, . Hoffmann et al., Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors, Neural Computation, vol.2010, issue.11, pp.32873-32883, 2013.
DOI : 10.1109/AT-EQUAL.2009.32
URL : https://infoscience.epfl.ch/record/185437/files/neco_a_00393.pdf

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by Simulated Annealing, Science, vol.220, issue.4598, 1983.
DOI : 10.1142/9789812799371_0035
URL : http://www.cs.virginia.edu/cs432/documents/sa-1983.pdf

J. Kober, J. Kober, and P. , Policy search for motor primitives in robotics, pp.849-856, 2008.

J. Kober and P. , Reinforcement Learning in Robotics: A Survey, Reinforcement Learning: State-of-the-Art, p.579610, 2012.

S. Koenig and R. G. Simmons, The eect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms, Machine Learning, pp.227250-227260, 1996.

R. Kohavi, A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Appears in the International Joint Conference on Articial Intelligence (IJCAI), p.17, 1995.

S. C. Lemon, . Roy, . Jason, M. A. Clark, . Friedmann et al., Classication and Regression Tree Analysis in Public Health: Methodological Review and Comparison With Logistic Regression. The Society of, Behavioral Medicine, vol.26, issue.3, p.172181, 2003.
DOI : 10.1207/s15324796abm2603_02

L. Li, M. L. Littman, and C. R. Mansley, Online exploration in least-squares policy iteration, The 8th International Conference on Autonomous Agents and Multiagent Systems, p.733739, 2009.

W. Loh, Classication and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.1, issue.1, 2011.
DOI : 10.1002/widm.8

A. K. Mccallum, Reinforcement learning with selective perception and hidden state, 1996.

T. Mcgeer, Passive Dynamic Walking, The International Journal of Robotics Research, vol.2, issue.4, pp.6282-6292, 1990.
DOI : 10.1109/JRA.1986.1087060

N. Meuleau, . Benazera, . Emmanuel, R. I. Brafman, E. A. Hansen et al., A Heuristic Search Approach to Planning, J. Artif. Int. Res, vol.34, p.2759, 2009.

. Nemec, . Bojan, . Zorko, . Matej, and L. Zlajpah, Learning of a ball-in-a-cup playing robot, 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2010), 2010.
DOI : 10.1109/RAAD.2010.5524570

A. Y. Ng, . Jordan, and . Michael, PEGASUS: A Policy Search Method for Large MDPs and POMDPs, Conference on Uncertainty in Articial Intelligence, vol.94720, 2000.

A. Y. Ng, . Kim, . Jin, M. I. Jordan, . Sastry et al., Inverted autonomous helicopter ight via reinforcement learning, International Symposium on Experimental Robotics, 2004.

A. Nouri, . Littman, and L. Michael, Multi-resolution Exploration in Continuous Spaces, Advances in Neural Information Processing Systems, p.12091216, 2009.

J. Pazis and M. G. Lagoudakis, Binary action search for learning continuous-action control policies, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.793800, 2009.
DOI : 10.1145/1553374.1553476
URL : http://www.cs.mcgill.ca/~icml2009/papers/532.pdf

C. Rasmussen and . Edward, Gaussian processes for machine, 2006.

N. Roy and S. Thrun, Online self-calibration for mobile robots, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C), p.22922297, 1999.
DOI : 10.1109/ROBOT.1999.770447

R. Y. Rubinstein, The Cross-Entropy Method for Combinatorial and Continuous Optimization, Methodology and Computing in Applied Probability, vol.1, issue.2, pp.127190-127200, 1999.

S. Sanner, K. Delgado, . Valdivia, L. De-barros, and . Nunes, Symbolic Dynamic Programming for Discrete and Continuous State MDPs, Proceedings of the 26th Conference on Articial Intelligence, 2012.

B. Scherrer, Performance bounds for ? policy iteration and application to the game of tetris, Journal of Machine Learning Research, vol.14, issue.1, p.11811227, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00185271

H. O. Wang, . Tanaka, and M. F. Griffin, An approach to fuzzy control of nonlinear systems: stability and design issues, IEEE Transactions on Fuzzy Systems, vol.4, issue.1, 1996.
DOI : 10.1109/91.481841

A. Weinstein and M. Littman, Open-Loop Planning in Large-Scale Stochastic Domains, 27th AAAI Conference on Articial Intelligence, p.14361442, 2013.

R. Williams, . Baird, and C. Leemon, Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions, Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems, p.108113, 1994.