P. Abbeel, M. Quigley, and A. Y. Ng, Using inaccurate models in reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML), 2006.

A. Ajay, J. Wu, N. Fazeli, M. Bauza, L. P. Kaelbling et al., Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing, Proceedings of the International Conference on Intelligent Robots (IROS), 2018.

B. D. Anderson and J. B. Moore, Optimal filtering, vol.21, pp.22-95, 1979.

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong et al., Hindsight experience replay, Advances in Neural Information Processing Systems (NIPS), 2017.

R. Antonova, A. Rai, and C. G. Atkeson, Sample efficient optimization for learning controllers for bipedal locomotion, Proceedings of the International Conference on Humanoid Robots (Humanoids), 2016.

R. Antonova, A. Rai, and C. G. Atkeson, Deep Kernels for Optimizing Locomotion Controllers, Proceedings of Conference on Robot Learning (CoRL), 2017.

J. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, Dataefficient learning of feedback policies from image pixels using deep dynamical models, NIPS Deep Reinforcement Learning Workshop, 2015.

K. J. Aström, Introduction to stochastic control theory. Courier Corporation, 2012.

C. G. Atkeson, B. P. Babu, N. Banerjee, D. Berenson, C. P. Bove et al., No falls, no resets: Reliable humanoid behavior in the DARPA robotics challenge, Proceedings of the International Conference on Humanoid Robots (Humanoids), 2015.

C. G. Atkeson, B. Babu, N. Banerjee, D. Berenson, C. Bove et al., What happened at the DARPA robotics challenge, and why? DRC Finals Special Issue of the, Journal of Field Robotics, 2016.

A. Auger and N. Hansen, A restart cma evolution strategy with increasing population size, Proceedings of IEEE Congress on Evolutionary Computation, 2005.

T. Bartz-beielstein, C. W. Lasarczyk, and M. Preuss, Sequential parameter optimization, Proceedings of IEEE Congress on Evolutionary Computation, 2005.

R. E. Bellman, Dynamic Programming, 1957.

F. Berkenkamp, A. P. Schoellig, and A. Krause, Safe Controller Optimization for Quadrotors with Gaussian Processes, Proceedings of the International Conference on Robotics and Automation (ICRA), 2016.

A. Billard, S. Calinon, R. Dillmann, and S. Schaal, Robot programming by demonstration, Springer handbook of robotics, pp.1371-1394, 2008.

B. Bischoff, D. Nguyen-tuong, H. Van-hoof, A. Mchutchon, C. E. Rasmussen et al., Policy search for learning robot control using sparse data, Proceedings of the International Conference on Robotics and Automation (ICRA), 2014.
DOI : 10.1109/icra.2014.6907422

URL : http://mediatum.ub.tum.de/doc/1281556/file.pdf

R. Bischoff, U. Huggenberger, and E. Prassler, Kuka youbot: a mobile manipulator for research and education, Proceedings of the International Conference on Robotics and Automation (ICRA), 2011.
DOI : 10.1109/icra.2011.5980575

J. Bongard, V. Zykov, and H. Lipson, Resilient machines through continuous self-modeling, Science, vol.314, issue.5802, pp.1118-1121, 2006.
DOI : 10.1126/science.1133687

J. C. Bongard and R. Pfeifer, Evolving complete agents using artificial ontogeny, Morpho-functional Machines: The New Species, pp.237-258, 2003.
DOI : 10.1007/978-4-431-67869-4_12

F. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Probabilistic Integration: A Role for Statisticians in Numerical Analysis? arXiv, 2015.

D. A. Bristow, M. Tharayil, and A. G. Alleyne, A survey of iterative learning control, IEEE Control Systems, vol.26, issue.3, pp.96-114, 2006.

E. Brochu, V. M. Cora, and N. D. Freitas, A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.

R. A. Brooks, Elephants don't play chess, Robotics and autonomous systems, vol.6, issue.1-2, pp.3-15, 1990.
DOI : 10.1016/s0921-8890(05)80025-9

URL : http://www.cs.sfu.ca/~vaughan/teaching/889/papers/elephants.pdf

R. A. Brooks, Intelligence without representation, Artificial intelligence, vol.47, issue.1-3, pp.139-159, 1991.
DOI : 10.1016/0004-3702(91)90053-m

C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling et al., A survey of monte carlo tree search methods, IEEE Transactions on Computational Intelligence and AI in games, vol.4, issue.1, pp.1-43, 2012.

J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, Learning variable impedance control, International Journal of Robotics Research, vol.30, issue.7, pp.820-833, 2011.
DOI : 10.1177/0278364911402527

R. Calandra, A. Seyfarth, J. Peters, and M. Deisenroth, Bayesian optimization for learning gaits under uncertainty, Annals of Mathematics and Artificial Intelligence, 2015.
DOI : 10.1007/s10472-015-9463-9

URL : http://spiral.imperial.ac.uk/bitstream/10044/1/24167/2/AMAI.pdf

S. Calinon, A tutorial on task-parameterized movement learning and retrieval, Intelligent Service Robotics, vol.9, issue.1, pp.1-29, 2016.
DOI : 10.1007/s11370-015-0187-9

S. Calinon, F. Guenter, and A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE Transactions on Systems, Man, and Cybernetics, vol.37, issue.2, pp.286-298, 2007.

S. Calinon, P. Kormushev, and D. G. Caldwell, Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning, Robotics and Autonomous Systems, vol.61, issue.4, pp.369-379, 2013.
DOI : 10.1016/j.robot.2012.09.012

URL : http://kormushev.com/papers/Calinon-RAS2012.pdf

S. Calinon, D. Bruno, and D. G. Caldwell, A task-parameterized probabilistic model with minimal intervention control, Proceedings of the International Conference on Robotics and Automation (ICRA), 2014.
DOI : 10.1109/icra.2014.6907339

E. F. Camacho and C. B. Alba, Model predictive control, 2013.
DOI : 10.1002/oca.2167

URL : https://hal.archives-ouvertes.fr/hal-00256633

R. Camoriano, S. Traversaro, L. Rosasco, G. Metta, and F. Nori, Incremental semiparametric inverse dynamics learning, Proceedings of the International Conference on Robotics and Automation (ICRA), 2016.
DOI : 10.1109/icra.2016.7487177

URL : http://arxiv.org/pdf/1601.04549

J. Carlson and R. R. Murphy, How UGVs physically fail in the field, IEEE Transactions on Robotics, vol.21, issue.3, pp.423-437, 2005.
DOI : 10.1109/tro.2004.838027

T. Cazenave and N. Jouandeau, On the parallelization of UCT, Proceedings of the Computer Games Workshop, 2007.

G. Chaslot, S. Bakkes, I. Szita, and P. Spronck, Monte-carlo tree search: A new framework for game AI, Proceedings of Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2008.

K. Chatzilygeroudis and J. Mouret, Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics, Proceedings of the International Conference on Robotics and Automation (ICRA), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768285

K. Chatzilygeroudis, A. Cully, and J. Mouret, Towards semi-episodic learning for robot damage recovery, AILTA '16: Proceedings of the International Workshop "AI for Long-term Autonomy, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01376288

K. Chatzilygeroudis, R. Rama, R. Kaushik, D. Goepp, V. Vassiliades et al., Black-Box Data-efficient Policy Search for Robotics, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01576683

K. Chatzilygeroudis, V. Vassiliades, and J. Mouret, Reset-free trial-and-error learning for robot damage recovery, Robotics and Autonomous Systems, vol.100, pp.236-250, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01654641

K. Chatzilygeroudis, V. Vassiliades, F. Stulp, S. Calinon, and J. Mouret, A survey on policy search algorithms for learning robot controllers in a handful of trials, 2018.

K. Chua, R. Calandra, R. Mcallister, and S. Levine, Deep reinforcement learning in a handful of trials using probabilistic dynamics models, Advances in Neural Information Processing Systems (NIPS), 2018.

K. Ciosek and S. Whiteson, Expected policy gradients, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018.

K. Ciosek and S. Whiteson, Expected Policy Gradients for Reinforcement Learning, 2018.

D. Cire¸sancire¸san, U. Meier, J. Masci, and J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks, vol.32, pp.333-338, 2012.

I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine et al., Learning to adapt in dynamic, real-world environments through metareinforcement learning, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

D. Clever, M. Harant, K. Mombaur, M. Naveau, O. Stasse et al., Cocomopl: A novel approach for humanoid walking generation combining optimal control, movement primitives and learning and its transfer to the real robot hrp-2, IEEE Robotics and Automation Letters, vol.2, issue.2, pp.977-984, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01459840

C. Colas, O. Sigaud, and P. Oudeyer, GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms, Proceedings of the International Conference on Machine Learning (ICML), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01840576

F. Corbato, On Building Systems That Will Fail, ACM Turing award lectures, vol.34, issue.9, pp.72-81, 2007.

A. Couëtoux, J. Hoock, N. Sokolovska, O. Teytaud, and N. Bonnard, Continuous upper confidence trees, Proceedings of Learning and Intelligent Optimization

A. Couetoux, M. Milone, M. Brendel, H. Doghmen, M. Sebag et al., Continuous rapid action value estimates, Proceedings of Asian Conference on Machine Learning, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00642459

A. Cully and Y. Demiris, Quality and Diversity Optimization: A Unifying Modular Framework, IEEE Transactions on Evolutionary Computation, 2017.

A. Cully and Y. Demiris, Hierarchical behavioral repertoires with unsupervised descriptors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), 2018.

A. Cully, J. Clune, D. Tarapore, and J. Mouret, Robots that can adapt like animals, Nature, vol.521, issue.7553, pp.503-507, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158243

A. Cully, K. Chatzilygeroudis, F. Allocati, and J. Mouret, Limbo: A Flexible High-performance Library for Gaussian Processes modeling and Data-Efficient Optimization, The Journal of Open Source Software, vol.3, issue.26, p.545, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01884299

M. Cutler and J. P. How, Efficient reinforcement learning for robots using informative simulated priors, Proceedings of the International Conference on Robotics and Automation (ICRA), 2015.

C. Daniel, G. Neumann, O. Kroemer, and J. Peters, Hierarchical relative entropy policy search, Journal of Machine Learning Research, pp.1-50, 2016.

D. , DARPA's ATLAS Robot Unveiled, 2013.

K. Deb, Multi-objective optimization, Search methodologies, pp.403-449, 2014.

K. Deb and D. Kalyanmoy, Multi-Objective Optimization Using Evolutionary Algorithms, 2001.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, vol.6, issue.2, pp.182-197, 2002.

M. Dedonato, F. Polido, K. Knoedler, B. P. Babu, N. Banerjee et al., Team WPI-CMU: Achieving Reliable Humanoid Behavior in the DARPA Robotics Challenge, Journal of Field Robotics, vol.34, issue.2, pp.381-399, 2017.

T. Degris, M. White, and R. S. Sutton, Linear off-policy actor-critic, Proceedings of the International Conference on Machine Learning (ICML)

M. P. Deisenroth and J. W. Ng, , 2015.

M. P. Deisenroth and C. E. Rasmussen, PILCO: A model-based and dataefficient approach to policy search, Proceedings of the International Conference on Machine Learning (ICML), 2011.

M. P. Deisenroth, C. E. Rasmussen, and D. Fox, Learning to control a low-cost manipulator using data-efficient reinforcement learning, Proceedings of Robotics: Science & Systems (RSS), 2011.

M. P. Deisenroth, R. Calandra, A. Seyfarth, and J. Peters, Toward fast policy search for learning legged locomotion, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2012.

M. P. Deisenroth, G. Neumann, and J. Peters, A Survey on Policy Search for Robotics. Foundations and Trends in Robotics, vol.2, issue.1, pp.1-142, 2013.

M. P. Deisenroth, P. Englert, J. Peters, and D. Fox, Multi-task policy search for robotics, Proceedings of the International Conference on Robotics and Automation (ICRA), 2014.

M. P. Deisenroth, D. Fox, and C. E. Rasmussen, Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions Pattern Analysis and Machine Intelligence, vol.37, pp.408-423, 2015.

S. Depeweg, J. M. Hernández-lobato, F. Doshi-velez, and S. Udluft, Learning and policy search in stochastic dynamical systems with bayesian neural networks, Proceedings of the International Conference on Learning Representations, 2017.

S. Depeweg, J. M. Hernández-lobato, F. Doshi-velez, and S. Udluft, Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning, Proceedings of the International Conference on Machine Learning (ICML), 2018.

A. Doerr, C. Daniel, D. Nguyen-tuong, A. Marco, S. Schaal et al., Optimizing long-term predictions for model-based policy search, Proceedings of Conference on Robot Learning (CoRL), 2017.

S. Doncieux and J. Mouret, Beyond black-box optimization: a review of selective pressures for evolutionary robotics, Evolutionary Intelligence, vol.7, issue.2, pp.71-93, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01150254

A. Droniou, S. Ivaldi, P. Stalph, M. Butz, and O. Sigaud, Learning velocity kinematics: Experimental comparison of on-line regression algorithms, Robotica, pp.15-20, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00719975

M. Duarte, J. Gomes, S. M. Oliveira, and A. L. Christensen, EvoRBC: evolutionary repertoire-based control for robots with arbitrary locomotion complexity, Proceedings of The Genetic and Evolutionary Computation Conference (GECCO), 2016.

M. Duarte, J. Gomes, S. M. Oliveira, and A. L. Christensen, Evolution of repertoire-based control for robots with complex locomotor systems, IEEE Transactions on Evolutionary Computation, 2017.

H. Durrant-whyte and T. Bailey, Simultaneous localization and mapping: part I, IEEE Robotics & Automation Magazine, vol.13, issue.2, pp.99-110, 2006.

B. Efron and R. J. Tibshirani, An introduction to the bootstrap, 1994.

S. Engel, R. Mannor, and . Meir, Reinforcement learning with Gaussian processes, Proceedings of the International Conference on Machine Learning (ICML), 2005.

H. Eskandari and C. D. Geiger, Evolutionary multiobjective optimization in noisy problem environments, Journal of Heuristics, vol.15, issue.6, p.559, 2009.

C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha et al., Evolution channels gradient descent in super neural networks, 2017.

M. Feurer, J. T. Springenberg, and F. Hutter, Initializing bayesian hyperparameter optimization via meta-learning, Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2015.

P. Fidelman and P. Stone, Learning ball acquisition on a physical robot, Proceedings of the International Symposium on Robotics and Automation (ISRA), 2004.

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the International Conference on Machine Learning (ICML), 2017.

S. Forestier, Y. Mollard, and P. Oudeyer, Intrinsically motivated goal exploration processes with automatic curriculum learning, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01651233

A. Gaier, A. Asteroth, and J. Mouret, Feature space modeling through surrogate illumination, Proceedings of The Genetic and Evolutionary Computation Conference, 2017.

Y. Gal and Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, Proceedings of the International Conference on Machine Learning (ICML)

Y. Gal, R. T. Mcallister, and C. E. Rasmussen, Improving PILCO with Bayesian neural network dynamics models, Data-Efficient Machine Learning Workshop, 2016.

C. Garcia and M. Delakis, Convolutional face finder: A neural architecture for fast and robust face detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.11, pp.1408-1423, 2004.

C. E. Garcia, D. M. Prett, and M. Morari, Model predictive control: theory and practice-a survey, Automatica, vol.25, issue.3, pp.335-348, 1989.

J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and J. P. Cunningham, Bayesian Optimization with Inequality Constraints, Proceedings of the International Conference on Machine Learning (ICML), 2014.

J. Gottlieb and P. Oudeyer, Information-seeking, curiosity, and attention: computational and neural mechanisms, Trends in Cognitive Sciences, vol.17, issue.11, pp.585-593, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00913646

F. Guenter, M. Hersch, S. Calinon, and A. Billard, Reinforcement learning for imitating constrained reaching movements, Advanced Robotics, Special Issue on Imitative Robots, vol.21, issue.13, pp.1521-1544, 2007.

E. Guizzo, Fukushima robot operator writes tell-all blog, IEEE Spectrum, 2011.

D. Ha and J. Schmidhuber, Recurrent World Models Facilitate Policy Evolution, Advances in Neural Information Processing Systems (NIPS), 2018.

T. Haarnoja, V. Pong, A. Zhou, M. Dalal, P. Abbeel et al., Composable Deep Reinforcement Learning for Robotic Manipulation, Proceedings of the International Conference on Robotics and Automation (ICRA), 2018.

D. Hafner, D. Tran, A. Irpan, T. Lillicrap, and J. Davidson, Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors, 2018.

N. Hansen, The CMA Evolution Strategy: A Comparing Review, 2006.

N. Hansen, Benchmarking a BI-population CMA-ES on the BBOB-2009 noisy testbed, Proceedings of The Genetic and Evolutionary Computation Conference, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00382101

N. Hansen and A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001.

N. Hansen, A. S. Niederberger, L. Guzzella, and P. Koumoutsakos, A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion, IEEE Transactions on Evolutionary Computation, vol.13, issue.1, pp.180-197, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00276216

H. V. Hasselt and M. A. Wiering, Reinforcement learning in continuous action spaces, 2007.

N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne et al., Emergence of locomotion behaviours in rich environments, 2017.

V. Heidrich-meisner and C. Igel, Hoeffding and bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the International Conference on Machine Learning (ICML), 2009.

P. Hennig and C. J. Schuler, Entropy search for information-efficient global optimization, Journal of Machine Learning Research, vol.13, pp.1809-1837, 2012.

T. Hester and P. Stone, TEXPLORE: real-time sample-efficient reinforcement learning for robots, Machine Learning, vol.90, pp.385-429, 2013.

J. C. Higuera, D. Meger, and G. Dudek, Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning, 2018.

R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De-turck et al., VIME: Variational information maximizing exploration, Advances in Neural Information Processing Systems (NIPS), 2016.

I. Hupkens, A. Deutz, K. Yang, and M. Emmerich, Faster exact algorithms for computing expected hypervolume improvement, Proceedings of the International Conference on Evolutionary Multi-Criterion Optimization, 2015.

F. Hutter, H. H. Hoos, K. Leyton-brown, and K. P. Murphy, An Experimental Investigation of Model-based Parameter Optimisation: SPO and Beyond, Proceedings of The Genetic and Evolutionary Computation Conference (GECCO), 2009.

A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal, Dynamical movement primitives: Learning attractor models for motor behaviors, Neural Computation, vol.25, issue.2, pp.328-373, 2013.

A. J. Ijspeert, J. Nakanishi, and S. Schaal, Movement imitation with nonlinear dynamical systems in humanoid robots, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.

A. J. Ijspeert, J. Nakanishi, and S. Schaal, Learning attractor landscapes for learning motor primitives, Advances in Neural Information Processing Systems (NIPS), 2003.

V. T. Inman and H. D. Eberhart, The major determinants in normal and pathological gait, JBJS, vol.35, issue.3, pp.543-558, 1953.

R. Isermann, Fault-diagnosis systems: an introduction from fault detection to fault tolerance, 2006.

D. H. Jacobson and D. Q. Mayne, Differential dynamic programming, 1970.

N. Jakobi, Evolutionary robotics and the radical envelope-of-noise hypothesis, Adaptive behavior, vol.6, issue.2, pp.325-368, 1997.

N. Jakobi, P. Husbands, and I. Harvey, Noise and the reality gap: The use of simulation in evolutionary robotics, Proceedings of the European Conference on Artificial Life, 1995.

S. James, A. J. Davison, and E. Johns, Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task, Proceedings of the Conference on Robot Learning (CoRL), 2017.

Y. Jin and J. Branke, Evolutionary optimization in uncertain environments-a survey, IEEE Transactions on Evolutionary Computation, vol.9, issue.3, pp.303-317, 2005.

G. and J. Steven, The NLopt nonlinear-optimization package

R. Jonschkowski and O. Brock, Learning state representations with robotic priors, Autonomous Robots, vol.39, issue.3, pp.407-428, 2015.

R. Jonschkowski, D. Rastogi, and O. Brock, Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors, Proceedings of Robotics: Science and Systems (RSS), 2018.

S. J. Julier and J. K. Uhlmann, Unscented filtering and nonlinear estimation, Proceedings of the IEEE, vol.92, issue.3, pp.401-422, 2004.

L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research, vol.4, pp.237-285, 1996.

S. M. Kakade, A natural policy gradient, Advances in Neural Information Processing Systems (NIPS), 2002.

M. Kalakrishnan, L. Righetti, P. Pastor, and S. Schaal, Learning force control policies for compliant manipulation, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2011.

R. E. Kalman, A new approach to linear filtering and prediction problems, Journal of basic Engineering, vol.82, issue.1, pp.35-45, 1960.

M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu, Convergence guarantees for kernel-based quadrature rules in misspecified settings, Advances in Neural Information Processing Systems (NIPS), 2016.

K. Kandasamy, J. Schneider, and B. Póczos, High dimensional bayesian optimisation and bandits via additive models, International Conference on Machine Learning (ICML), 2015.

S. Karaman and E. Frazzoli, Sampling-based algorithms for optimal motion planning, The International Journal of Robotics Research, vol.30, issue.7, pp.846-894, 2011.

R. Kaushik, K. Chatzilygeroudis, and J. Mouret, Multi-objective modelbased policy search for data-efficient learning with sparse rewards, Proceedings of the Conference on Robot Learning (CoRL), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01884294

E. Keogh and A. Mueen, Curse of dimensionality, Encyclopedia of Machine Learning, pp.257-258, 2011.

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al., Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, 2017.

J. Ko, D. J. Klein, D. Fox, and D. Haehnel, Gaussian processes and reinforcement learning for identification and control of an autonomous blimp, Proceedings of the International Conference on Robotics and Automation (ICRA), 2007.

J. Kober and J. Peters, Learning motor primitives for robotics, Proceedings of the International Conference on Robotics and Automation (ICRA), 2009.

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, International Journal of Robotics Research, vol.32, issue.11, pp.1238-1274, 2013.

J. Koenemann, A. Del-prete, Y. Tassa, E. Todorov, O. Stasse et al., Whole-body model-predictive control applied to the HRP-2 humanoid, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01137021

N. Kohl and P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, Proceedings of the International Conference on Robotics and Automation (ICRA), 2004.

V. R. Konda and J. N. Tsitsiklis, Actor-critic algorithms, Advances in Neural Information Processing Systems (NIPS), 2000.

S. Koos and J. Mouret, Online discovery of locomotion modes for wheellegged hybrid robots: A transferability-based approach, Proceedings of the International Conference on Climbing and Walking Robots and Support Technologies for Mobile Machines, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00633930

S. Koos, A. Cully, and J. Mouret, Fast damage recovery in robotics with the t-resilience algorithm, The International Journal of Robotics Research, vol.32, issue.14, pp.1700-1723, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00932862

S. Koos, J. Mouret, and S. Doncieux, The transferability approach: Crossing the reality gap in evolutionary robotics, IEEE Transactions on Evolutionary Computation, vol.17, issue.1, pp.122-145, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00687617

M. Krstic, I. Kanellakopoulos, and P. V. Kokotovic, Nonlinear and adaptive control design, vol.222, 1995.

V. Kumar, E. Todorov, and S. Levine, Optimal control with learned local models: Application to dexterous manipulation, Proceedings of the International Conference on Robotics and Automation (ICRA), 2016.

A. Kupcsik, M. P. Deisenroth, J. Peters, A. P. Loh, P. Vadakkepat et al., Model-based contextual policy search for data-efficient generalization of robot skills, Artificial Intelligence, vol.247, pp.415-439, 2017.

H. J. Kushner, A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise, Journal of Basic Engineering, vol.86, issue.1, pp.97-106, 1964.

K. J. Kyriakopoulos and G. N. Saridis, Minimum jerk path generation, Proceedings of the International Conference on Robotics and Automation (ICRA), 1988.

A. Laversanne-finot, A. Péré, and P. Oudeyer, Curiosity Driven Exploration of Learned Disentangled Goal Spaces, Proceedings of the Conference on Robot Learning (CoRL), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01891598

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, Face recognition: A convolutional neural-network approach, IEEE Transactions on Neural Networks, vol.8, issue.1, pp.98-113, 1997.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

G. Lee, S. S. Srinivasa, and M. T. Mason, Gp-ilqg: Data-driven robust optimal control for uncertain nonlinear dynamical systems, 2017.

J. Lee, DART: Dynamic Animation and Robotics Toolkit, The Journal of Open Source Software, vol.3, issue.22, 2018.

J. H. Lee, K. S. Lee, and W. C. Kim, Model-based iterative learning control with a quadratic criterion for time-varying linear systems, Automatica, vol.36, issue.5, pp.641-657, 2000.

K. S. Lee, I. Chin, H. J. Lee, and J. H. Lee, Model predictive control technique combined with iterative learning for batch processes, AIChE Journal, vol.45, issue.10, pp.2175-2187, 1999.

S. Legg and M. Hutter, Universal intelligence: A definition of machine intelligence. Minds and Machines, vol.17, pp.391-444, 2007.

J. Lehman and K. O. Stanley, Exploiting open-endedness to solve problems through the search for novelty, Proceedings of the Conference on Artificial Life (ALIFE), 2008.

J. Lehman and K. O. Stanley, Abandoning objectives: Evolution through the search for novelty alone, Evolutionary Computation, vol.19, issue.2, pp.189-223, 2011.

J. Lehman, S. Risi, and J. Clune, Creative generation of 3D objects with deep learning and innovation engines, Proceedings of the International Conference on Computational Creativity, 2016.

J. Lehman, J. Chen, J. Clune, and K. O. Stanley, ES Is More Than Just a Traditional Finite-Difference Approximator, 2017.

S. Lengagne, J. Vaillant, E. Yoshida, and A. Kheddar, Generation of whole-body optimal dynamic multi-contact motions, International Journal of Robotics Research, vol.32, issue.9, pp.1104-1119, 2013.
URL : https://hal.archives-ouvertes.fr/lirmm-00819250

T. Lesort and D. Filliat, Unsupervised deep learning of state representation using robotic priors, Proceedings of the International Conference on Learning Representations (ICLR, 2017.

T. Lesort, M. Seurin, X. Li, N. D. Rodríguez, and D. Filliat, Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01644423

T. Lesort, N. Díaz-rodríguez, J. Goudou, and D. Filliat, State representation learning for control: An overview, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01858558

S. Levine and P. Abbeel, Learning neural network policies with guided policy search under unknown dynamics, Advances in Neural Information Processing Systems (NIPS), 2014.

S. Levine and V. Koltun, Guided policy search, Proceedings of the International Conference on Machine Learning (ICML), 2013.

S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of Machine Learning Research, vol.17, issue.39, pp.1-40, 2016.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez et al., Continuous control with deep reinforcement learning, Proceedings of the International Conference on Learning Representations (ICLR), 2016.

H. Liu, Y. Ong, X. Shen, and J. Cai, When Gaussian Process Meets Big Data: A Review of Scalable GPs, 2018.

D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, Automatic gait optimization with gaussian process regression, Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), 2007.

R. Lober, V. Padois, and O. Sigaud, Efficient reinforcement learning for humanoid whole-body control, Proceedings of the International Conference on Humanoid Robots (Humanoids), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01377831

R. Lober, J. Eljaik, G. Nava, S. Dafarra, F. Romano et al., Optimizing Task Feasibility using ModelFree Policy Search and Model-Based Whole-Body Control, Proceedings of the International Conference on Robotics and Automation (ICRA), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01620370

K. M. Lynch and F. C. Park, Modern Robotics, 2017.

N. Mansard, O. Stasse, P. Evrard, and A. Kheddar, A versatile generalized inverted kinematics implementation for collaborative working humanoid robots: The stack of tasks, Proceedings of the International Conference on Advanced Robotics, 2009.
URL : https://hal.archives-ouvertes.fr/lirmm-00796736

A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause et al., Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization, Proceedings of the International Conference on Robotics and Automation (ICRA), 2017.

R. Martinez-cantin, N. De-freitas, A. Doucet, and J. A. Castellanos, Active Policy Learning for Robot Planning and Exploration under Uncertainty, Proceedings of Robotics: Science & Systems (RSS), 2007.
DOI : 10.15607/rss.2007.iii.041

URL : https://doi.org/10.15607/rss.2007.iii.041

R. Martinez-cantin, N. De-freitas, E. Brochu, J. Castellanos, and A. Doucet, A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot, Autonomous Robots, vol.27, issue.2, 2009.

T. Matsubara, S. Hyon, and J. Morimoto, Learning parametric dynamic movement primitives from multiple demonstrations, Neural Networks, vol.24, issue.5, pp.493-500, 2011.
DOI : 10.1007/978-3-642-17537-4_43

D. Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems, International Journal of Control, vol.3, issue.1, pp.85-95, 1966.

R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine learning: An artificial intelligence approach, 2013.

B. L. Miller and D. E. Goldberg, Genetic algorithms, selection schemes, and the varying effects of noise, Evolutionary Computation, vol.4, issue.2, pp.113-131, 1996.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap et al., Asynchronous methods for deep reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML), 2016.

W. Montgomery, A. Ajay, C. Finn, P. Abbeel, and S. Levine, Reset-free guided policy search: efficient deep reinforcement learning with stochastic initial states, Proceedings of the International Conference on Robotics and Automation (ICRA), 2017.

K. L. Moore, M. Dahleh, and S. Bhattacharyya, Iterative learning control: A survey and new results, Journal of Field Robotics, vol.9, issue.5, pp.563-594, 1992.

C. Moulin-frier and P. Oudeyer, Exploration strategies in developmental robotics: a unified probabilistic framework, Proceedings of the Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00860641

J. Mouret, Novelty-based Multiobjectivization, New Horizons in Evolutionary Robotics, pp.139-154, 2011.
DOI : 10.1007/978-3-642-18272-3_10

URL : https://hal.archives-ouvertes.fr/hal-01300711

J. Mouret, Micro-data learning: The other end of the spectrum, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01374786

J. Mouret and K. Chatzilygeroudis, 20 Years of Reality Gap: a few Thoughts about Simulators in Evolutionary Robotics, Workshop" Simulation in Evolutionary Robotics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01518764

J. Mouret and J. Clune, Illuminating search spaces by mapping elites, 2015.

J. Mouret and S. Doncieux, Sferes v2 : Evolvin'in the multi-core world, Proceedings of Congress on Evolutionary Computation (CEC), 2010.
URL : https://hal.archives-ouvertes.fr/hal-00687633

J. Mouret and S. Doncieux, Encouraging Behavioral Diversity in Evolutionary Robotics: an Empirical Study, Evolutionary Computation, vol.20, issue.1, pp.91-133, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00687609

R. M. Murray, A mathematical introduction to robotic manipulation, 2017.

J. A. Musick and C. J. Limpus, Habitat utilization and migration in juvenile sea turtles. The biology of sea turtles, vol.1, pp.137-163, 1997.

G. Nelson, A. Saunders, N. Neville, B. Swilling, J. Bondaryk et al., Petman: A humanoid robot for testing chemical protective clothing, Journal of the Robotics Society of Japan, vol.30, issue.4, pp.372-377, 2012.
DOI : 10.7210/jrsj.30.372

URL : https://www.jstage.jst.go.jp/article/jrsj/30/4/30_30_372/_pdf

A. Y. Ng and M. Jordan, PEGASUS: a policy search method for large MDPs and POMDPs, Proceedings of Uncertainty in Artificial Intelligence, 2000.

A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte et al., Autonomous inverted helicopter flight via reinforcement learning, Experimental Robotics IX, pp.363-372, 2006.
DOI : 10.1007/11552246_35

A. Nguyen, J. Yosinski, and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/cvpr.2015.7298640

URL : http://yosinski.com/media/papers/Nguyen__2014__arXiv__Deep_Neural_Networks_are_Easily_Fooled.pdf

A. Nguyen, J. Yosinski, and J. Clune, Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning, Evolutionary Computation, vol.24, pp.545-572, 2016.
DOI : 10.1162/evco_a_00189

URL : https://www.mitpressjournals.org/userimages/ContentEditor/1164817256746/lib_rec_form.pdf

A. M. Nguyen, J. Yosinski, and J. Clune, Innovation engines: Automated creativity and improved stochastic optimization via deep learning, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), 2015.
DOI : 10.1162/evco_a_00189

URL : https://www.mitpressjournals.org/userimages/ContentEditor/1164817256746/lib_rec_form.pdf

D. Nguyen-tuong and J. Peters, Using model knowledge for learning inverse dynamics, Proceedings of the International Conference on Robotics and Automation (ICRA), 2010.

D. Nguyen-tuong and J. Peters, Model learning for robot control: a survey, Cognitive Processing, vol.12, issue.4, pp.319-340, 2011.
DOI : 10.1007/s10339-011-0404-1

N. J. Nilsson, Shakey the robot, 1984.

F. Nori, S. Traversaro, J. Eljaik, F. Romano, A. Del-prete et al., iCub whole-body control through force regulation on rigid non-coplanar contacts, Frontiers in Robotics and AI, vol.2, issue.6, 2015.
DOI : 10.3389/frobt.2015.00006

URL : https://hal.archives-ouvertes.fr/hal-01137239

C. Null and B. Caulfield, Fade To Black The 1980s vision of "lights-out" manufacturing, where robots do all the work, is a dream no more, 2003.

J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh, Action-conditional video prediction using deep networks in atari games, Advances in Neural Information Processing Systems (NIPS), pp.2863-2871, 2015.

A. O'hagan, Bayes-hermite quadrature, Journal of Statistical Planning and Inference, 1991.

P. Oudeyer, Curiosity and languages. Catalogue of the Exhibition, Fondation Cartier pour l'Art Contemporain, p.180, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00788574

P. Oudeyer, Computational Theories of Curiosity-Driven Learning, 2018.
DOI : 10.31234/osf.io/3p8f6

URL : http://arxiv.org/pdf/1802.10546

P. Oudeyer, F. Kaplan, V. V. Hafner, and A. Whyte, The playground experiment: Task-independent development of a curious robot, Proceedings of the AAAI Spring Symposium on Developmental Robotics, 2005.

P. Oudeyer, F. Kaplan, and V. V. Hafner, Intrinsic Motivation Systems for Autonomous Mental Development, IEEE transactions on Evolutionary Computation, vol.11, issue.2, pp.265-286, 2007.
DOI : 10.1109/tevc.2006.890271

URL : http://cogprints.org/5473/1/ims.pdf

V. Padois, S. Ivaldi, J. Babi?, M. Mistry, J. Peters et al., Whole-body multi-contact motion in humans and humanoids: Advances of the CoDyCo European project, Robotics and Autonomous Systems, vol.90, pp.97-117, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01399360

V. Papaspyros, K. Chatzilygeroudis, V. Vassiliades, and J. Mouret, Safetyaware robot damage recovery using constrained bayesian optimization and simulated priors, BayesOpt'16: Proceedings of the International Workshop "Bayesian Optimization: Black-box Optimization and Beyond" at NIPS, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01407757

C. Park and D. Apley, Patchwork kriging for large-scale gaussian process regression, 2017.

S. Paul, K. Chatzilygeroudis, K. Ciosek, J. Mouret, M. A. Osborne et al., Alternating Optimisation and Quadrature for Robust Control, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01644063

R. Pautrat, K. Chatzilygeroudis, and J. Mouret, Bayesian optimization with automatic prior selection for data-efficient direct policy search, Proceedings of the International Conference on Robotics and Automation (ICRA), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768279

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, Sim-to-real transfer of robotic control with dynamics randomization, 2017.

R. Pfeifer and J. Bongard, How the body shapes the way we think: a new view of intelligence, 2006.

C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy et al., Learning predictive terrain models for legged robot locomotion, Proceedings of the International Conference on Intelligent Robots (IROS), 2008.

A. S. Polydoros and L. Nalpantidis, Survey of Model-Based Reinforcement Learning: Applications on Robotics, Journal of Intelligent & Robotic Systems, pp.1-21, 2017.

J. K. Pugh, L. B. Soros, and K. O. Stanley, Quality diversity: A new frontier for evolutionary computation, Frontiers in Robotics and AI, vol.3, p.40, 2016.

J. Queisser and J. Steil, Bootstrapping of Parameterized Skills Through Hybrid Optimization in Task and Policy Spaces. Frontiers in Robotics and AI, 2018.

J. Quiñonero-candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, Journal of Machine Learning Research, vol.6, pp.1939-1959, 2005.

M. Raibert, K. Blankespoor, G. Nelson, and R. Playter, Bigdog, the roughterrain quadruped robot, Proceedings of IFAC, pp.10822-10825, 2008.

A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, Epopt: Learning robust neural network policies using model ensembles, Proceedings of the International Conference on Learning Representations (ICLR, 2017.

C. E. Rasmussen and Z. Ghahramani, Bayesian monte carlo, Advances in Neural Information Processing Systems (NIPS), 2003.

C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning, vol.1, 2006.

J. B. Rawlings and D. Q. Mayne, Model predictive control: Theory and design, 2009.

, J. Rieffel and J.-B. Mouret. Soft tensegrity robots. Soft Robotics, 2018.

P. Rolet, M. Sebag, and O. Teytaud, Boosting active learning to optimality: A tractable monte-carlo, billiard-based algorithm, Proceedings of the European Conference on Machine Learning (ECML), 2009.
URL : https://hal.archives-ouvertes.fr/inria-00433866

P. Rolland, J. Scarlett, I. Bogunovic, and V. Cevher, High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.

T. H. Rowan, Functional stability analysis of numerical algorithms, 1990.

G. A. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, vol.37, 1994.

S. J. Russell and P. Norvig, Artificial intelligence: a modern approach

R. M. Ryan and E. L. Deci, Intrinsic and extrinsic motivations: Classic definitions and new directions, Contemporary educational psychology, vol.25, issue.1, pp.54-67, 2000.

S. Saemundsson, K. Hofmann, and M. P. Deisenroth, Meta reinforcement learning with latent variable gaussian processes, Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2018.

J. Salini, V. Padois, and P. Bidaud, Synthesis of complex humanoid whole-body behavior: a focus on sequencing and tasks transitions, Proceedings of the International Conference on Robotics and Automation (ICRA), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00578073

M. Saveriano, Y. Yin, P. Falco, and D. Lee, Data-efficient control policy search using residual dynamics learning, Proc. of IROS, 2017.

J. Schmidhuber, Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts, Connection Science, vol.18, issue.2, pp.173-187, 2006.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, International Conference on Machine Learning (ICML), pp.1889-1897, 2015.

R. Sellaouti, O. Stasse, S. Kajita, K. Yokoi, and A. Kheddar, Faster and smoother walking of humanoid hrp-2 with passive toe joints, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2006.

R. J. Serfling, Probability inequalities for the sum in sampling without replacement. The Annals of Statistics, pp.39-48, 1974.

B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De-freitas, Taking the human out of the loop: A review of bayesian optimization, Proceedings of the IEEE, vol.104, issue.1, pp.148-175, 2016.

B. Siciliano and O. Khatib, Springer handbook of robotics, 2016.

O. Sigaud and F. Stulp, Policy search in continuous action domains: an overview, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02182466

D. Silver and J. Veness, Monte-carlo planning in large POMDPs, Advances in Neural Information Processing Systems (NIPS), 2010.

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra et al., Deterministic policy gradient algorithms, Proceedings of the International Conference on Machine Learning (ICML), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00938992

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016.

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai et al., Mastering chess and shogi by self-play with a general reinforcement learning algorithm, 2017.

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang et al., Mastering the game of go without human knowledge, Nature, vol.550, issue.7676, p.354, 2017.

K. Sims, Evolving virtual creatures, Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1994.

J. Snoek, K. Swersky, R. Zemel, and R. Adams, Input warping for bayesian optimization of non-stationary functions, Proceedings of the International Conference on Machine Learning (ICML), 2014.

J. Spitz, K. Bouyarmane, S. Ivaldi, and J. Mouret, Trial-and-error learning of repulsors for humanoid qp-based whole-body control, Proceedings of the International Conference on Humanoid Robots (Humanoids, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01569948

M. W. Spong and D. J. Block, The pendubot: A mechatronic system for control research and education, Proceedings of Decision and Control, 1995.

N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: no regret and experimental design, Proceedings of the International Conference on Machine Learning (ICML), 2010.

K. O. Stanley and R. Miikkulainen, Evolving neural networks through augmenting topologies, Evolutionary Computation, 2002.

F. Stulp and O. Sigaud, Policy improvement: Between black-box optimization and episodic reinforcement learning, Journées Francophones Planification, Décision, et Apprentissage pour la conduite de systèmes, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922133

F. Stulp and O. Sigaud, Robot skill learning: From reinforcement learning to evolution strategies, Paladyn. Journal of Behavioral Robotics, vol.4, issue.1, pp.49-61, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922132

F. Stulp, E. Theodorou, and S. Schaal, Reinforcement learning with sequences of motion primitives for robust manipulation, IEEE Transactions on Robotics, vol.28, issue.6, pp.1360-1370, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00766177

F. Stulp, G. Raiola, A. Hoarau, S. Ivaldi, and O. Sigaud, Learning compact parameterized skills with a single regression, Proceedings of the International Conference on Humanoid Robots (Humanoids), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922135

M. Sugiyama, I. Takeuchi, T. Suzuki, T. Kanamori, H. Hachiya et al., Least-squares conditional density estimation, IEICE Transactions on Information and Systems, vol.93, issue.3, pp.583-594, 2010.

R. S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin, vol.2, issue.4, pp.160-163, 1991.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol.1, 1998.

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems (NIPS), pp.1057-1063, 2000.

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

V. Tangkaratt, S. Mori, T. Zhao, J. Morimoto, and M. Sugiyama, Modelbased policy gradients with parameter-based exploration by least-squares conditional density estimation, Neural Networks, vol.57, pp.128-140, 2014.

M. E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

E. Todorov and W. Li, A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems, Proceedings of the American Control Conference, 2005.

N. G. Tsagarakis, G. Metta, G. Sandini, D. Vernon, R. Beira et al., icub: the design and realization of an open humanoid platform for cognitive and neuroscience research, Advanced Robotics, vol.21, issue.10, pp.1151-1175, 2007.

S. Tsutsui and A. Ghosh, Genetic algorithms with a robust solution searching scheme, IEEE transactions on Evolutionary Computation, vol.1, issue.3, pp.201-208, 1997.

S. C. Turaga, J. F. Murray, V. Jain, F. Roth, M. Helmstaedter et al., Convolutional networks can learn to generate affinity graphs for image segmentation, Neural Computation, vol.22, issue.2, pp.511-538, 2010.

R. Vaillant, C. Monrocq, and Y. Le-cun, Original approach for the localisation of objects in images, IEE Proceedings-Vision, Image and Signal Processing, vol.141, pp.245-250, 1994.

H. Van-seijen, H. Van-hasselt, S. Whiteson, and M. Wiering, A theoretical and empirical analysis of Expected Sarsa, Proceedings of the Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2009.

V. Vassiliades, K. Chatzilygeroudis, and J. Mouret, Using centroidal Voronoi tessellations to scale up the multi-dimensional archive of phenotypic elites algorithm, IEEE Transactions on Evolutionary Computation, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01630627

V. Verma, G. Gordon, R. Simmons, and S. Thrun, Real-time fault diagnosis, IEEE Robotics & Automation Magazine, vol.11, issue.2, pp.56-66, 2004.

N. Wahlström, T. B. Schön, and M. P. Desienroth, Learning deep dynamical models from image pixels, Proceedings of the 17th IFAC Symposium on System Identification (SYSID), 2015.

Y. Wang, D. Zhou, and F. Gao, Iterative learning model predictive control for multi-phase batch processes, Journal of Process Control, vol.18, issue.6, pp.543-557, 2008.

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. De-feitas, Bayesian optimization in a billion dimensions via random embeddings, Journal of Artificial Intelligence Research, vol.55, pp.361-387, 2016.

I. O. Warren and B. Powell, Optimal Learning. Wiley Series in Probability and Statistics, 2012.

C. J. Watkins and P. Dayan, Q-learning, Machine learning, vol.8, issue.3-4, pp.279-292, 1992.

P. Wawrzynski, Learning to control a 6-degree-of-freedom walking robot, Proceedings of the International Conference on, 2007.

P. Werbos, Approximate dynamic programming for realtime control and neural modelling. Handbook of intelligent control: neural, fuzzy and adaptive approaches, pp.493-525, 1992.

B. J. Williams, T. J. Santner, and W. I. Notz, Sequential design of computer experiments to minimize integrated response functions, Statistica Sinica, 2000.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992.

A. Wilson, A. Fern, and P. Tadepalli, Using trajectory data to improve bayesian optimization for reinforcement learning, Journal of Machine Learning Research, vol.15, issue.1, pp.253-282, 2014.

T. Wu and J. Movellan, Semi-parametric Gaussian process for robot system identification, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2012.

K. Yang, D. Gaida, T. Bäck, and M. Emmerich, Expected hypervolume improvement algorithm for PID controller tuning and the multiobjective dynamical control of a biogas plant, Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2015.

W. Yu, C. K. Liu, and G. Turk, Preparing for the unknown: Learning a universal policy with online system identification, Proceedings of Robotics: Science and Systems, 2017.

T. Zhang, G. Kahn, S. Levine, and P. Abbeel, Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search, Proceedings of the International Conference on Robotics and Automation (ICRA), 2016.

S. Zhu, A. Kimmel, K. E. Bekris, and A. Boularias, Fast Model Identification via Physics Engines for Data-Efficient Policy Search, Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), 2018.

M. Zimmer, Y. Boniface, and A. Dutech, Neural Fitted Actor-Critic, Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350651