P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, An application of reinforcement learning to aerobatic helicopter flight, Advances in Neural Information Processing Systems 19, pp.1-8, 2007.

P. Abbeel and A. Y. Ng, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430
URL : http://www.aicml.cs.ualberta.ca/banff04/icml/pages/papers/335.pdf

R. Akrour, M. Schoenauer, and M. Sebag, APRIL: Active Preference Learning-Based Reinforcement Learning, 2012.
DOI : 10.1007/978-3-642-33486-3_8
URL : https://hal.archives-ouvertes.fr/hal-00722744

J. A. Bagnell, A. Y. Ng, and J. Schneider, Solving uncertain markov decision problems, 2001.

L. C. Baird, Advantage updating, 1993.
DOI : 10.21236/ADA280862

N. Bäuerle and U. Rieder, Markov Decision Processes with Applications to Finance, 2011.
DOI : 10.1007/978-3-642-18324-9

R. Bellman, A Markovian Decision Process, Indiana University Mathematics Journal, vol.6, issue.4, 1957.
DOI : 10.1512/iumj.1957.6.56038
URL : http://www.dtic.mil/cgi-bin/GetTRDoc?AD=AD0606367&Location=U2&doc=GetTRDoc.pdf

J. F. Benders, Partitioning procedures for solving mixed-variables programming problems, Computational Management Science, vol.2, issue.1, pp.3-19, 2005.
DOI : 10.1007/s10287-004-0020-y

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
DOI : 10.1007/0-306-48332-7_333

A. Bhattacharya and S. K. Das, LeZi-update, Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking , MobiCom '99, pp.121-135, 2002.
DOI : 10.1145/313451.313457

J. Boger, J. Hoey, P. Poupart, C. Boutilier, G. Fernie et al., A Planning System Based on Markov Decision Processes to Guide People With Dementia Through Activities of Daily Living, IEEE Transactions on Information Technology in Biomedicine, vol.10, issue.2, pp.323-333, 2006.
DOI : 10.1109/TITB.2006.864480

A. Boularias, J. Kober, and J. Peters, Relative Entropy Inverse Reinforcement Learning, Proceedings of the 14th International Con-ference on Artificial Intelligence and Statistics, pp.182-189, 2011.

C. Boutilier, R. Das, J. O. Kephart, G. Tesauro, W. et al., Cooperative negotiation in autonomic systems using incremental utility elicitation, Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI'03, pp.89-97, 2003.

C. Boutilier, R. Patrascu, P. Poupart, and D. Schuurmans, Constraint-based optimization and utility elicitation using the minimax decision criterion, Artificial Intelligence, vol.170, issue.8-9, pp.686-713, 2006.
DOI : 10.1016/j.artint.2006.02.003
URL : https://doi.org/10.1016/j.artint.2006.02.003

C. Boutilier, R. Patrascu, P. Poupart, and D. Schuurmans, Constraint-based optimization and utility elicitation using the minimax decision criterion, Artificial Intelligence, vol.170, issue.8-9, pp.8-9686, 2006.
DOI : 10.1016/j.artint.2006.02.003
URL : https://doi.org/10.1016/j.artint.2006.02.003

U. Chajewska, D. Koller, and R. Parr, Making rational decisions using adaptive utility elicitation, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, pp.363-369, 2000.

J. Fearnley, Strategy iteration algorithms for games and markov decision pro- cesses, 2010.

J. Fürnkranz, E. Hüllermeier, W. Cheng, and S. Park, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, pp.123-156, 2012.
DOI : 10.1016/S0004-3702(01)00110-2

H. Gilbert, O. Spanjaard, P. Viappiani, and P. Weng, Reducing the Number of Queries in Interactive Value Iteration, Algorithmic Decision Theory -4th International Conference Proceedings, pp.139-152, 2015.
DOI : 10.1007/978-3-319-23114-3_9
URL : https://hal.archives-ouvertes.fr/hal-01213280

J. Girard, Concurrent markov decision processes for robust robot team learning under uncertainty, 2014.
DOI : 10.1016/j.engappai.2014.12.007

R. Givan, S. M. Leach, and T. L. Dean, Bounded-parameter Markov decision processes, Artificial Intelligence, vol.122, issue.1-2, pp.71-109, 2000.
DOI : 10.1016/S0004-3702(00)00047-3
URL : https://doi.org/10.1016/s0004-3702(00)00047-3

G. Z. Grudic and P. D. Lawrence, Human-to-robot skill transfer using the SPORE approximation, Proceedings of IEEE International Conference on Robotics and Automation, pp.2962-2967, 1996.
DOI : 10.1109/ROBOT.1996.509162

R. A. Howard, Dynamic programming and markov processes, pp.296-297, 1960.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, Machine Learning Proceedings of the Nineteenth International Conference, pp.267-274, 2002.

S. M. Kakade, On the sample complexity of reinforcement learning, 2003.

E. Klein, M. Geist, B. Piot, and O. Pietquin, Inverse Reinforcement Learning through Structured Classification, Advances in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778624

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. L. Littman, T. L. Dean, and L. P. Kaelbling, On the complexity of solving markov decision problems, IN PROC. OF THE ELEVENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, p.394, 1995.

H. B. Mcmahan, G. J. Gordon, and A. Blum, Planning in the presence of cost functions controlled by an adversary, Machine Learning, Proceedings of the Twentieth International Conference, pp.536-543, 2003.

K. V. Moffaert and A. Nowé, Multi-objective reinforcement learning using sets of pareto dominating policies, Journal of Machine Learning Research, vol.15, pp.3663-3692, 2014.

A. Y. Ng and S. Russell, Algorithms for inverse reinforcement learning, Proc. 17th International Conf. on Machine Learning, pp.663-670, 2000.

A. Nilim, E. Ghaoui, and L. , Robustness in markov decision problems with uncertain transition matrices, NIPS, 2004.
DOI : 10.1287/opre.1050.0216
URL : http://robotics.eecs.berkeley.edu/~elghaoui/Pubs/RobMDP_OR2005.pdf

C. H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and optimal access of Web sources, Proceedings 41st Annual Symposium on Foundations of Computer Science, p.86, 2000.
DOI : 10.1109/SFCS.2000.892068

K. Pa?ek and ?. Rozman, Decision making under conditions of uncertainty in agriculture: a case study of oil crops, Poljoprivreda, vol.15, issue.1, pp.45-50, 2009.

P. Perny, P. Weng, J. Goldsmith, H. , and J. , Approximation of lorenzoptimal solutions in multiobjective markov decision processes, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01216091

O. Pietquin, Inverse reinforcement learning for interactive systems, Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems Bridging the Gap Between Perception, Action and Communication, MLIS '13, pp.71-75, 2013.
DOI : 10.1145/2493525.2493529
URL : https://hal.archives-ouvertes.fr/hal-00869812

J. Pineau, Tractable planning under uncertainty: Exploiting structure, 2004.

D. A. Pomerleau, Efficient Training of Artificial Neural Networks for Autonomous Navigation, Neural Computation, vol.3, issue.1, pp.88-97, 1991.
DOI : 10.1162/neco.1989.1.4.541

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, 1994.
DOI : 10.1002/9780470316887

M. L. Puterman, Markov decision processes : discrete stochastic dynamic programming Wiley series in probability and mathematical statistics, J. Wiley & Sons J, 2005.
DOI : 10.1002/9780470316887

K. Regan and C. Boutilier, Regret-based reward elicitation for markov decision processes, NIPS-08 workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008.

K. Regan and C. Boutilier, Regret-based reward elicitation for markov decision processes, UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp.444-451, 2009.

K. Regan and C. Boutilier, Robust policy computation in reward-uncertain mdps using nondominated policies, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, 2010.

K. Regan and C. Boutilier, Eliciting additive reward functions for markov decision processes, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp.2159-2164, 2011.

K. Regan and C. Boutilier, Robust online optimization of reward-uncertain mdps, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp.2165-2171, 2011.

K. Regan and C. Boutilier, Regret-based reward elicitation for markov decision processes, 1205.

D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, A survey of multiobjective sequential decision-making, J. Artif. Intell. Res. (JAIR), vol.48, pp.67-113, 2013.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

P. Viappiani and C. Boutilier, Optimal bayesian recommendation sets and myopically optimal choice query sets, 2010.

K. Wakuta, Vector-valued Markov decision processes and the systems of linear inequalities, Stochastic Processes and their Applications, pp.159-169, 1995.
DOI : 10.1016/0304-4149(94)00064-Z

C. J. Watkins, Learning from Delayed Rewards King's College, 1989.

P. Weng, Markov decision processes with ordinal rewards: Reference pointbased preferences, Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01285812

P. Weng, Ordinal Decision Models for Markov Decision Processes, European Conference on Artificial Intelligence, pp.828-833, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01273056

P. Weng and B. Zanuttini, Interactive Value Iteration for Markov Decision Processes with Unknown Rewards, Proc. 23th International Joint Conference Artificial Intelligence (IJCAI2013), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00942290

H. Xu and S. Mannor, Parametric regret in uncertain Markov decision processes, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp.3606-3613, 2009.
DOI : 10.1109/CDC.2009.5400796

A. Ya and D. Kane, walkr: Mcmc sampling from nonnegative convex polytopes, 2015.

B. D. Ziebart, A. L. Maas, A. K. Dey, and J. A. Bagnell, Navigate like a cabbie, Proceedings of the 10th international conference on Ubiquitous computing, UbiComp '08, pp.322-331, 2008.
DOI : 10.1145/1409635.1409678