P. Abbeel, . Ng, and Y. Andrew, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.6759

D. Aberdeen and O. Buffet, Concurrent probabilistic temporal planning with policy-gradients, International Conference on Automated Planning and Scheduling, 2007.

R. Akrour, M. Schoenauer, and M. Sebag, Preferencebased policy learning, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00625001

C. Amato, . Konidaris, . George, . Cruz, . Gabriel et al., Planning for decentralized control of multiple robots under uncertainty, 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015.
DOI : 10.1109/ICRA.2015.7139350

C. Amato, G. D. Konidaris, . Kaelbling, and P. Leslie, Planning with macro-actions in decentralized pomdps, International Conference on Autonomous Agents and Multiagent Systems, 2014.

B. D. Argall, . Chernova, . Sonia, M. Veloso, and B. Browning, A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, issue.5, pp.469-483, 2009.
DOI : 10.1016/j.robot.2008.10.024

E. Beaudry, . Kabanza, and M. Froduald, Planning with concurrency under resources and time uncertainty, European Conference on Artificial Intelligence, 2010.

H. Blockeel, D. Raedt, and L. , Top-down induction of firstorder logical decision trees, Artificial intelligence, vol.101, issue.1, 1998.
DOI : 10.1016/s0004-3702(98)00034-4
URL : http://doi.org/10.1016/s0004-3702(98)00034-4

L. Breiman, . Friedman, . Jerome, C. J. Stone, and R. A. Olshen, Classification and regression trees, 1984.

C. B. Browne, . Powley, . Edward, . Whitehouse, . Daniel et al., and others, 2012. A survey of monte carlo tree search methods, Transactions on Computational Intelligence and AI in Games

O. Buffet and D. Aberdeen, The factored policy-gradient planner, Artificial Intelligence, vol.173, issue.5-6, pp.5-6, 2009.
DOI : 10.1016/j.artint.2008.11.008
URL : https://hal.archives-ouvertes.fr/inria-00330031

. Chandramohan, . Senthilkumar, . Geist, . Matthieu, F. Lefevre et al., User simulation in dialogue systems using inverse reinforcement learning, Interspeech, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652446

S. Chernova and M. Veloso, Interactive policy learning through confidence-based autonomy, Journal of Artificial Intelligence Research, vol.34, issue.1, 2009.

D. Raedt and L. , Logical and relational learning, 2008.
DOI : 10.1007/978-3-540-68856-3

S. D?eroski, D. Raedt, . Luc, and K. Driessens, Relational reinforcement learning, Machine learning, vol.43, issue.12, 2001.
DOI : 10.1007/BFb0027307

G. E. Fainekos, . Kress-gazit, . Hadas, and G. J. Pappas, Hybrid Controllers for Path Planning: A Temporal Logic Approach, Proceedings of the 44th IEEE Conference on Decision and Control, 2005.
DOI : 10.1109/CDC.2005.1582935
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.4410

J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, vol.29, issue.5, 2001.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine Learning, vol.63, issue.1, 2006.
DOI : 10.1007/s10994-006-6226-1
URL : https://hal.archives-ouvertes.fr/hal-00341932

M. C. Gombolay, R. A. Gutierrez, S. G. Clarke, G. F. Sturla, and J. A. Shah, Decision-making authority, team efficiency and human worker satisfaction in mixed human?robot teams, Autonomous Robots, vol.39, issue.3, 2015.
DOI : 10.15607/rss.2014.x.046

D. H. Grollman, O. Jenkins, and . Chadwicke, Dogged Learning for Robots, Proceedings 2007 IEEE International Conference on Robotics and Automation, 2007.
DOI : 10.1109/ROBOT.2007.363692
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.179.3071

N. Hansen, S. Müller, and P. Koumoutsakos, Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES), Evolutionary Computation, vol.11, issue.1, 2003.
DOI : 10.1162/106365601750190398

S. Hart and R. Grupen, Learning Generalizable Control Programs, IEEE Transactions on Autonomous Mental Development, vol.3, issue.3, 2011.
DOI : 10.1109/TAMD.2010.2103311

A. Jain, . Sharma, . Shikhar, T. Joachims, and A. Saxena, Learning preferences for manipulation tasks from online coactive feedback, The International Journal of Robotics Research, vol.34, issue.10, p.34, 2015.
DOI : 10.1177/0278364915581193

A. Jain, . Wojcik, . Brian, T. Joachims, and A. Saxena, Learning trajectory preferences for manipulators via iterative improvement, Conference on Neural Information Processing Systems, 2013.

T. Keller and M. Helmert, Trial-based heuristic tree search for finite horizon MDPs, International Conference on Automated Planning and Scheduling, 2013.

K. Kersting, M. Otterlo, D. Van, and L. Raedt, Bellman goes relational, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015401
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.8073

R. Khardon, Learning action strategies for planning domains, Artificial Intelligence, vol.113, issue.1-2, 1999.
DOI : 10.1016/S0004-3702(99)00060-0
URL : http://doi.org/10.1016/s0004-3702(99)00060-0

E. Klein, . Geist, . Matthieu, . Piot, and O. Pietquin, Inverse reinforcement learning through structured classification, Conference on Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778624

E. Klein, . Piot, . Bilal, M. Geist, and O. Pietquin, A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804

W. Knox, . Bradley, and P. Stone, Combining manual feedback with subsequent mdp reward signals for reinforcement learning, International Conference on Autonomous Agents and Multiagent Systems, 2010.

W. Knox, . Bradley, . Stone, . Peter, and C. Breazeal, Training a Robot via Human Feedback: A Case Study, International Conference on Social Robotics, 2013.
DOI : 10.1007/978-3-319-02675-6_46
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.378.2844

J. Kober and P. , Learning motor primitives for robotics, 2009 IEEE International Conference on Robotics and Automation, 2009.
DOI : 10.1109/ROBOT.2009.5152577
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.8287

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, European Conference on Machine Learning, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

H. S. Koppula, . Jain, . Ashesh, and A. Saxena, Anticipatory Planning for Human-Robot Teams, International Symposium on Experimental Robotics, 2016.
DOI : 10.1109/ICRA.2013.6631293

T. Lang and M. Toussaint, Planning with noisy probabilistic relational rules, Journal of Artificial Intelligence Research, vol.39, issue.1, 2010.

T. Lang, M. Toussaint, and K. Kersting, Exploration in relational domains for model-based reinforcement learning, The Journal of Machine Learning Research, vol.13, issue.1, 2012.

M. Lee, . Kyung, . Forlizzi, . Jodi, . Kiesler et al., Personalization in HRI, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12
DOI : 10.1145/2157689.2157804

M. Lopes, . Melo, . Francisco, B. Kenward, S. et al., A Computational Model of Social-Learning Mechanisms, Adaptive Behavior, vol.17, issue.6, p.17, 2009.
DOI : 10.1177/1059712309342757

M. Lopes, F. Melo, and L. Montesano, Active Learning for Reward Estimation in Inverse Reinforcement Learning, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2009.
DOI : 10.1007/978-3-642-04174-7_3

T. Luksch, . Gienger, . Michael, . Muhlig, . Manuel et al., Adaptive movement sequences and predictive decisions based on hierarchical dynamical systems, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
DOI : 10.1109/IROS.2012.6385651
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.347.6270

N. A. Mamitsuka and . Hiroshi, Query learning strategies using boosting and bagging, International Conference on Machine Learning, 1998.

M. Mason and M. Lopes, Robot self-initiative and personalization by learning through repeated interactions, Proceedings of the 6th international conference on Human-robot interaction, HRI '11, 2011.
DOI : 10.1145/1957656.1957814
URL : https://hal.archives-ouvertes.fr/hal-00636164

W. Mausam and S. Daniel, Planning with durative actions in stochastic domains, Journal of Artificial Intelligence Research, p.31, 2008.

. Mitsunaga, . Noriaki, . Smith, . Christian, . Kanda et al., Adapting Robot Behavior for Human--Robot Interaction, IEEE Transactions on Robotics, vol.24, issue.4, 2008.
DOI : 10.1109/TRO.2008.926867
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.475.7765

T. Munzer, Y. Mollard, and M. Lopes, Impact of Robot Initiative on Human-Robot Collaboration, Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction , HRI '17, 2017.
DOI : 10.1145/3029798.3038373

T. Munzer, . Piot, . Bilal, . Geist, . Matthieu et al., Inverse reinforcement learning in relational domains, International Joint Conference on Artificial Intelligence, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01154650

T. Munzer, M. Toussaint, and M. Lopes, Preference learning on the execution of collaborative human-robot tasks, International Conference on Robotics and Automation, 2017.

. Natarajan, . Sriraam, . Joshi, . Saket, . Tadepalli et al., Imitation learning in relational domains: A functional-gradient boosting approach, International Joint Conference on Artificial Intelligence, 2011.

. Natarajan, . Sriraam, . Khot, . Tushar, . Kersting et al., Gradient-based boosting for statistical relational learning: The relational dependency network case, Machine Learning, 2012.
DOI : 10.1007/s10994-011-5244-9

. Natarajan, . Sriraam, . Kunapuli, . Gautam, . Judah et al., Multi-Agent Inverse Reinforcement Learning, 2010 Ninth International Conference on Machine Learning and Applications, 2010.
DOI : 10.1109/ICMLA.2010.65
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.5667

G. Neu and C. Szepesvári, Apprenticeship learning using inverse reinforcement learning and gradient methods, In Uncertainty in Artificial Intelligence, 2007.

G. Neu and C. Szepesvári, Training parsers by inverse reinforcement learning, Machine Learning, vol.285, issue.5, 2009.
DOI : 10.1007/s10994-009-5110-1
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.3712

A. Y. Ng, . Harada, . Daishi, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, International Conference on Machine Learning, 1999.

A. Y. Ng, . Russell, and J. Stuart, Algorithms for inverse reinforcement learning, International Conference on Machine Learning, 2000.

S. Niekum, . Chitta, . Sachin, . Marthi, . Bhaskara et al., Incremental Semantically Grounded Learning from Demonstration, Robotics: Science and Systems IX, 2013.
DOI : 10.15607/RSS.2013.IX.048

S. Nikolaidis, . Gu, . Keren, . Ramakrishnan, . Ramya et al., Efficient model learning for human-robot collaborative tasks. arXiv preprint, 2014.
DOI : 10.1145/2696454.2696455

S. Nikolaidis and J. Shah, Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2013.
DOI : 10.1109/HRI.2013.6483499

. Piot, . Bilal, M. Geist, and O. Pietquin, Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.
DOI : 10.1007/978-3-642-40988-2_2
URL : https://hal.archives-ouvertes.fr/hal-00916938

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

J. Quinlan and . Ross, C4. 5: Programs for Machine Learning, 1993.

K. Rohanimanesh, . Mahadevan, and . Sridhar, Learning to take concurrent actions, Conference on Neural Information Processing Systems, 2002.

K. Rohanimanesh, . Mahadevan, and . Sridhar, Coarticulation, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102442

S. Russell, Learning agents for uncertain environments, Conference on Learning Theory, 1998.
DOI : 10.1145/279943.279964
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.6795

S. Sanner, Relational dynamic influence diagram language (rddl): Language description, 2010.

S. Schaal, Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 1999.

A. Segre, . Dejong, and F. Gerald, Explanation-based manipulator learning: Acquisition of planning ability through observation, Proceedings. 1985 IEEE International Conference on Robotics and Automation, 1985.
DOI : 10.1109/ROBOT.1985.1087311

J. W. Shavlik, . Dejong, and F. Gerald, Bagger: an ebl system that extends and generalizes explanations, AAAI Conference on Artificial Intelligence, 1987.

P. Shivaswamy and T. Joachims, Online structured prediction via coactive learning, 2012.

D. E. Smith, . Weld, and S. Daniel, Temporal planning with mutual exclusion reasoning, International Joint Conference on Artificial Intelligence, 1999.

A. L. Thomaz and C. Breazeal, Teachable robots: Understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, vol.172, issue.6-7, 2008.
DOI : 10.1016/j.artint.2007.09.009
URL : http://doi.org/10.1016/j.artint.2007.09.009

M. Toussaint, . Munzer, . Thibaut, . Mollard, . Yoan et al., Relational activity processes for modeling concurrent cooperation, 2016 IEEE International Conference on Robotics and Automation (ICRA)
DOI : 10.1109/ICRA.2016.7487765
URL : https://hal.archives-ouvertes.fr/hal-01399247

C. J. Watkins, Learning from delayed rewards, 1989.

R. Wilcox, . Nikolaidis, . Stefanos, and J. Shah, Optimization of Temporal Dynamics for Adaptive Human-Robot Interaction in Assembly Manufacturing, Robotics: Science and Systems VIII, 2012.
DOI : 10.15607/RSS.2012.VIII.056

. Yoon, . Sungwook, A. Fern, and R. Givan, Inductive policy selection for first-order mdps, Uncertainty in Artificial Intelligence, 2002.

H. Younes and R. G. Simmons, Policy generation for continuoustime stochastic domains with concurrency, International Conference on Automated Planning and Scheduling, 2004.

L. S. Zettlemoyer, H. Pasula, L. Kaelbling, and . Pack, Learning planning rules in noisy stochastic worlds, AAAI Conference on Artificial Intelligence, 2005.

B. D. Ziebart, A. L. Maas, . Bagnell, . Andrew, . Dey et al., Maximum entropy inverse reinforcement learning, AAAI Conference on Artificial Intelligence, 2008.