, Bibliography Optimization and Nonsmooth Analysis
, Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org. (? page 131
Multiagent Reinforcement Learning: Algorithm Converging to Nash Equilibrium in General-Sum Discounted Stochastic Games, Proc. of AAMAS, p.38, 2009. ,
Fitted-Q Iteration in Continuous Action-Space MDPs, Proc. of NIPS, p.18, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00185311
Learning near-optimal policies with bellmanresidual minimization based fitted policy iteration and a single sample path, Machine Learning, p.21, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00117130
On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Policy Search by Dynamic Programming, Proc. of NIPS, page None, p.68, 2003. ,
Residual Algorithms: Reinforcement Learning with Function Approximation, Proc. of ICML, pp.17-117, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Adaptive policy gradient in multiagent learning, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, p.39, 2003. ,
DOI : 10.1145/860575.860686
Dynamic Programming, p.11, 1957. ,
Polynomial approximation?a new computational technique in dynamic programming: Allocation processes, Mathematics of Computation, vol.17, p.18, 1963. ,
, Bertsekas. Dynamic Programming and Optimal Control, vol.1, issue.93, p.91, 1995.
, Bibliography
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294 ,
DOI : 10.1016/S0167-6911(97)90015-3
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, p.146, 1997. ,
DOI : 10.1016/S0167-6911(97)90015-3
Stochastic approximation with 'controlled markov'noise, Systems & control letters, vol.154, p.148, 2006. ,
Stochastic Approximation: A Dynamical Systems Viewpoint, p.154, 2009. ,
A user's guide to solving dynamic stochastic games using the homotopy method, Operations Research, vol.58, pp.2010-2047 ,
Algorithms for computing strategies in two-player simultaneous move games, Artificial Intelligence, vol.237, issue.150, pp.1-40, 2016. ,
DOI : 10.1016/j.artint.2016.03.005
Rational and Convergent Learning in Stochastic Games, Proc. of IJCAI, pp.39-146, 2001. ,
Classification and Regression Trees, p.52, 1984. ,
Regret analysis of stochastic and nonstochastic multiarmed bandit problems, Machine Learning, pp.1-122 ,
Solving the Oshi-Zumo Game, pp.361-366, 2004. ,
DOI : 10.1007/978-0-387-35706-5_23
A Comprehensive Survey of Multiagent Reinforcement Learning, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.38, issue.2, p.155, 2008. ,
DOI : 10.1109/TSMCC.2007.913919
Online least-squares policy iteration for reinforcement learning control, American Control Conference, pp.2010-2031, 2010. ,
Prediction, Learning, and Games, p.39, 2006. ,
DOI : 10.1017/CBO9780511546921
Directional Derivative of a Minimax Function Nonlinear Analysis: Theory, Methods & Applications, vol.9, issue.108, pp.13-22, 1985. ,
The Complexity of Computing a Nash Equilibrium, SIAM Journal on Computing, vol.39, issue.1, pp.195-259, 2009. ,
DOI : 10.1137/070699652
Solving stochastic games, Advances in Neural Information Processing Systems, p.38, 2009. ,
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, pp.503-556, 2005. ,
Error Propagation for Approximate Policy and Value Iteration, Proc. of NIPS, pp.2010-2031 ,
DOI : 10.1109/tac.2015.2418411
URL : https://hal.archives-ouvertes.fr/hal-00830154
Competitive Markov Decision Processes, p.29, 2012. ,
DOI : 10.1007/978-1-4612-4054-9
On the Algorithm of Pollatschek and Avi-ltzhak, pp.32-112, 1991. ,
DOI : 10.1007/978-94-011-3760-7_6
Classification-Based Policy Iteration with a Critic, Proc. of ICML, pp.1049-1056 ,
URL : https://hal.archives-ouvertes.fr/hal-00590972
Deep Learning. Book in preparation for, pp.2016-131 ,
Correlated Q-learning, Proc. of ICML, p.39, 2003. ,
Modelling Transition Dynamics in MDPs With RKHS Embeddings, Proc. of ICML, p.24, 2012. ,
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, pp.2013-2029 ,
DOI : 10.1145/2432622.2432623
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, p.34, 2013. ,
DOI : 10.1145/2432622.2432623
URL : http://arxiv.org/pdf/1008.0530
Uncoupled dynamics do not lead to nash equilibrium. The American Economic Review, p.147, 2003. ,
DOI : 10.1142/9789814390705_0007
URL : http://www.ma.huji.ac.il/hart/papers/uncoupl.pdf
Fictitious self-play in extensive-form games, Proc. of ICML, pp.2015-2054 ,
Stationary Equilibria in Stochastic Games: Structure, Selection and Computation, SSRN Electronic Journal, vol.118, p.37, 2004. ,
DOI : 10.2139/ssrn.357201
Homotopy methods to compute equilibria in game theory, Economic Theory, vol.42, pp.2010-2047 ,
, Bibliography
On the Global Convergence of Stochastic Fictitious Play, Econometrica, vol.70, issue.6, pp.2265-2294, 2002. ,
DOI : 10.1111/1468-0262.00376
On Nonterminating Stochastic Games, Management Science, vol.12, issue.5, pp.359-370, 1966. ,
DOI : 10.1287/mnsc.12.5.359
Nash Q-Learning for General-Sum Stochastic Games, Journal of Machine Learning Research, vol.4, pp.1039-1069, 2003. ,
Rational Learning Leads to Nash Equilibrium, Econometrica, vol.61, issue.5, p.38, 1993. ,
DOI : 10.2307/2951492
URL : http://www.kellogg.northwestern.edu/research/math/papers/895.pdf
A New Polynomial-time Algorithm for Linear Programming, Proc. of ACM Symposium on Theory of Computing, p.27, 1984. ,
DOI : 10.1007/bf02579150
Fast Planning in Stochastic Games, Proc. of UAI, p.93, 2000. ,
Policy Iteration for Factored MDPs, Proc. of UAI, pp.326-334, 2000. ,
Fast algorithms for finding randomized strategies in game trees, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , STOC '94, p.27, 1994. ,
DOI : 10.1145/195058.195451
Value Function Approximation in Zero-Sum Markov Games, Proc. of UAI, p.36, 2002. ,
Least-Squares Policy Iteration, Journal of Machine Learning Research, vol.19, issue.111, pp.1107-1149, 2003. ,
Monte carlo sampling for regret minimization in extensive games, Proc. of NIPS, p.38, 2009. ,
The world of independent learners is not markovian, International Journal of Knowledge-based and Intelligent Engineering Systems, vol.15, issue.1, pp.2011-2049 ,
DOI : 10.3233/KES-2010-0206
URL : https://hal.archives-ouvertes.fr/hal-00601941
Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.2012-2033 ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Deep learning, Nature, vol.9, issue.7553, pp.436-444 ,
DOI : 10.1007/s10994-013-5335-x
Generalised weakened fictitious play, Games and Economic Behavior, vol.56, issue.2, pp.285-298, 2006. ,
DOI : 10.1016/j.geb.2005.08.005
Non-stationary approximate modified policy iteration, pp.16-23, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01186664
Nonsmooth Optimization via BFGS, p.106, 2009. ,
DOI : 10.1007/s10107-012-0514-2
URL : http://www.cs.nyu.edu/faculty/overton/papers/pdffiles/nsoquasi.pdf
Nonsmooth optimization via quasi-Newton methods, Mathematical Programming, vol.128, issue.1-2, pp.135-163, 2013. ,
DOI : 10.1175/1520-0493(2000)129<4031:UODANO>2.0.CO;2
URL : http://www.cs.nyu.edu/faculty/overton/papers/pdffiles/nsoquasi.pdf
Online exploration in least-squares policy iteration, Proc. of AAMAS, p.21, 2009. ,
Continuous Control with Deep Reinforcement Learning, Proc. of ICLR, pp.2016-128 ,
Markov games as a framework for multi-agent reinforcement learning, Proc. of ICML, p.39, 1994. ,
DOI : 10.1016/B978-1-55860-335-6.50027-1
URL : http://www.ee.duke.edu/~lcarin/emag/seminar_presentations/Markov_Games_Littman.pdf
Toward Off-Policy Learning Control with Function Approximation, Proc. of ICML, pp.719-726 ,
Finite-Sample Analysis of Bellman Residual Minimization, Proc. of ACML, pp.124-127, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830212
Learning Strategies in Games by Anticipation, Proc. of IJCAI 97 Volumes, pp.698-707, 1997. ,
URL : https://hal.archives-ouvertes.fr/hal-01649000
Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.529-533 ,
DOI : 10.1016/S0004-3702(98)00023-X
Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007. ,
DOI : 10.1137/040614384
URL : http://hal.archives-ouvertes.fr/docs/00/12/46/85/PDF/avi_siam_final.pdf
Finite-Time Bounds for Fitted Value Iteration, The Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Algorithmic Game Theory, p.26, 2007. ,
DOI : 10.1017/CBO9780511800481
Numerical Optimization, 2006. ,
DOI : 10.1007/b98874
Stochastic Shortest Path Games, SIAM Journal on Control and Optimization, vol.37, issue.3, pp.31-33, 1997. ,
DOI : 10.1137/S0363012996299557
URL : http://www-mit.mit.edu/dimitrib/www/sspg.pdf
Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games, Proc. of ICML, pp.43-91, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01153270
On the use of non-stationary strategies for solving two-player zero-sum markov games, Proc. of AISTATS, p.63, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01291495
Difference of convex functions programming for reinforcement learning, Proc. of NIPS, 2014a. (? pages 23, pp.128-134 ,
URL : https://hal.archives-ouvertes.fr/hal-01104419
Boosted Bellman Residual Minimization Handling Expert Demonstrations, Proc. of ECML, p.24, 2014. ,
DOI : 10.1007/978-3-662-44851-9_35
URL : https://hal.archives-ouvertes.fr/hal-01060953
Algorithms for Stochastic Games with Geometrical Interpretation, Management Science, vol.15, issue.7, pp.399-415, 1969. ,
DOI : 10.1287/mnsc.15.7.399
Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proc. of AAMAS, p.38, 2015. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, p.53, 1994. ,
DOI : 10.1002/9780470316887
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, Proc. of ECML. 2005. (? page 18 ,
DOI : 10.1007/11564096_32
An Iterative Method of Solving a Game, The Annals of Mathematics, vol.54, issue.2, pp.296-301, 1951. ,
DOI : 10.2307/1969530
Approximate Policy Iteration Schemes: A Comparison, Proc. of ICML, p.67, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00989982
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration, Mathematics of Operations Research, vol.41, issue.3, pp.2016-2032 ,
DOI : 10.1287/moor.2015.0753
URL : https://hal.archives-ouvertes.fr/hal-00921261
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, Proc. of NIPS, pp.12-71, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Approximate Modified Policy Iteration, Proc. of ICML, 2012. (? pages 21, pp.58-81 ,
URL : https://hal.archives-ouvertes.fr/hal-00758882
Stochastic Games, Proc. of the National Academy of Sciences of the United States of America, pp.11-171, 1953. ,
Some topics in two-person games Advances in game theory, p.147, 1964. ,
Multiagent systems: Algorithmic, game-theoretic, and logical foundations, p.26, 2008. ,
DOI : 10.1017/CBO9780511811654
If multi-agent learning is the answer, what is the question?, Artificial Intelligence, vol.171, issue.7, pp.365-377, 2007. ,
DOI : 10.1016/j.artint.2006.02.006
Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs, Proc. of UAI, p.24, 2012. ,
Discounted Markov games: Generalized policy iteration method, Journal of Optimization Theory and Applications, vol.30, issue.1, pp.125-138, 1978. ,
DOI : 10.1007/BF00933260
, Machine Learning, pp.279-292, 1992.
Cyclic Equilibria in Markov Games, Proc. of NIPS, pp.78-125, 2006. ,
Regret minimization in games with incomplete information, Proc. of NIPS, p.27, 2008. ,