« An introduciton to the application of the theory of probabilistic functions of a markov process to automatic speech recognition, The Bell System Technical Journal, 1983. ,
« Self-improving reactive agent based on reinforcement learning, planing and teaching, Machine Learning, pp.293-321, 1992. ,
DOI : 10.1007/978-1-4615-3618-5_5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.7884
« Memory based reinforcement efficient computation by prioritized sweeping, Advances in neural information processing systems, 1993. ,
Hidden Markov model induction by bayesian model merging. », Advances in neural information processing systems, 1992. ,
Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,
DOI : 10.1007/BF00115009
Integrated modeling and control based on reinforcement learning and dynamic programming, Advances in neural information processing systems, 1991. ,
« Efficient exploration in reinforcement learning. », rapport n o CMU- CS-92-102, 1992. ,
Learning from delayed rewards. », PhD thesis, King's College of Cambridge, 1989. ,
Optimizing memory-bounded controllers for decentralized POMDPs, Proc. of the Twenty-Third Conf. on Uncertainty in Artificial Intelligence (UAI-07), 2007. ,
DOI : 10.1007/s10458-009-9103-z
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.8934
Solving POMDPs using quadratically constrained linear programs, Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , AAMAS '06, 2007. ,
DOI : 10.1145/1160633.1160694
URL : http://anytime.cs.umass.edu/aimath06/proceedings/P56.pdf
Bounded dynamic programming for decentralized POMDPs, Proc. of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM) in AAMAS'07, 2007. ,
Incremental policy generation for finitehorizon DEC-POMDPs, Proc. of the Nineteenth Int. Conf. on Automated Planning and Scheduling (ICAPS-09), 2009. ,
Time-varying feedback laws for decentralized control, Nineteenth IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, pp.519-524, 1980. ,
DOI : 10.1109/TAC.1981.1102770
Solving transition independent decentralized Markov decision processes, Journal of Artificial Intelligence Research, vol.22, pp.423-455, 2004. ,
DOI : 10.1145/860575.860583
URL : http://anytime.cs.umass.edu/shlomo/papers/aamas03a.pdf
Dynamic programming, 1957. ,
The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, pp.819-840, 2002. ,
DOI : 10.1287/moor.27.4.819.297
Bounded policy iteration for decentralized POMDPs, Proc. of the Nineteenth Int. Joint Conf. on Artificial Intelligence (IJCAI), pp.1287-1292, 2005. ,
Exact dynamic programming for decentralized pomdps with lossless policy compression, Proc. of the Int. Conf. on Automated Planning and Scheduling (ICAPS'08), 2008. ,
Planning, learning and coordination in multiagent decision processes, Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK '96), De Zeeuwse Stromen, 1996. ,
Shaping multi-agent systems with gradient reinforcement learning, Autonomous Agents and Multi-Agent Systems, vol.33, issue.1, pp.197-220, 2007. ,
DOI : 10.1007/s10458-006-9010-5
URL : https://hal.archives-ouvertes.fr/inria-00118983
Acting optimally in partially observable stochastic domains, Proc. of the 12th Nat. Conf. on Artificial Intelligence (AAAI), 1994. ,
A heuristic approach for solving decentralized-POMDP, Proceedings of the 2002 ACM symposium on Applied computing , SAC '02, pp.57-62, 2002. ,
DOI : 10.1145/508791.508804
Valid inequalities for mixed integer linear programs, Mathematical Programming, vol.30, issue.1, pp.3-44, 2008. ,
DOI : 10.1007/s10107-006-0086-0
On the Significance of Solving Linear Programming Problems with Some Integer Variables, Econometrica, vol.28, issue.1, pp.30-44, 1960. ,
DOI : 10.2307/1905292
A Probabilistic Production and Inventory Problem, Management Science, vol.10, issue.1, pp.98-108, 1963. ,
DOI : 10.1287/mnsc.10.1.98
Introduction to Applied Optimization, 2008. ,
Multilinear programming: Duality theories, Journal of Optimization Theory and Applications, vol.3, issue.3, pp.459-486, 1992. ,
DOI : 10.1007/BF00939837
Practical Methods of Optimization, 1987. ,
DOI : 10.1002/9781118723203
Learning to communicate and act in cooperative multiagent systems using hierarchical reinforcement learning, Proc. of the 3rd Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'04), 2004. ,
A global Newton method to compute Nash equilibria, Journal of Economic Theory, vol.110, issue.1, pp.65-86, 2001. ,
DOI : 10.1016/S0022-0531(03)00005-X
Dynamic programming for partially observable stochastic games, Proc. of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004. ,
Global Optimization: Deterministic Approaches, 2003. ,
Learning and discovery of predictive state representations in dynamical systems with reset, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015359
Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998. ,
DOI : 10.1016/S0004-3702(98)00023-X
Fast algorithms for finding randomized strategies in game trees, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , STOC '94, pp.750-759, 1994. ,
DOI : 10.1145/195058.195451
Finding mixed strategies with small supports in extensive form games, International Journal of Game Theory, vol.18, issue.1, pp.73-92, 1996. ,
DOI : 10.1007/BF01254386
Bimatrix Equilibrium Points and Mathematical Programming, Management Science, vol.11, issue.7, pp.681-689, 1965. ,
DOI : 10.1287/mnsc.11.7.681
Linear and Nonlinear Programming, 1984. ,
DOI : 10.1007/978-3-319-18842-3
Online discovery and learning of predictive state representations, Advances in Neural Information Processing Systems 18 (NIPS'05), 2005. ,
Taming decentralized POMDPs: towards efficient policy computation for multiagent setting, Proc. of Int. Joint Conference on Artificial Intelligence, IJCAI'03, 2003. ,
Optimal and approximate Q-value functions for decentralized POMDPs, Journal of Artificial Intelligence Research, vol.32, pp.289-353, 2008. ,
Lossless clustering of histories in decentralized POMDPs, Proc. of The International Joint Conference on Autonomous Agents and Multi Agent Systems, pp.577-584, 2009. ,
A Course in Game Theory, 1994. ,
Combinatorial Optimization: Algorithms and Complexity, 1982. ,
The Complexity of Markov Decision Processes, Mathematics of Operations Research, vol.12, issue.3, pp.441-450, 1987. ,
DOI : 10.1287/moor.12.3.441
Game theory and decision theory in multi-agent systems, Autonomous Agents and Multi-Agent Systems, vol.5, issue.3, pp.243-254, 2002. ,
DOI : 10.1023/A:1015575522401
Average-reward decentralized Markov decision processes, Proc. of the Twentieth Int. Joint Conf. on Artificial Intelligence, 2007. ,
A bilinear programming approach for multiagent planning, Journal of Artificial Intelligence Research, vol.35, pp.235-274, 2009. ,
Markov Decision Processes: discrete stochastic dynamic programming, 1994. ,
DOI : 10.1002/9780470316887
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories And Models, Journal of Artificial Intelligence Research, vol.16, pp.389-423, 2002. ,
The Application of Linear Programming to Team Decision Problems, Management Science, vol.5, issue.2, pp.143-150, 1959. ,
DOI : 10.1287/mnsc.5.2.143
Artificial Intelligence: A modern approach, p.395, 1995. ,
Multiagent systems, chap. Distributed rational decision making, pp.201-258, 1999. ,
Mixed-integer programming methods for finding nash equilibria, Proc. of the National Conference on Artificial Intelligence (AAAI), 2005. ,
Cooperative co-learning: a model-based approach for solving multi-agent reinforcement problems, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings., 2002. ,
DOI : 10.1109/TAI.2002.1180839
URL : https://hal.archives-ouvertes.fr/inria-00100814
Memory-bounded dynamic programming for DEC- POMDPs, Proc. of the Twentieth Int. Joint Conf. on Artificial Intelligence (IJ- CAI'07), 2007. ,
Learning Without State-Estimation in Partially Observable Markovian Decision Processes, Proceedings of the Eleventh International Conference on Machine Learning, 1994. ,
DOI : 10.1016/B978-1-55860-335-6.50042-8
Learning predictive state representations, Proc. of the Twentieth Int. Conf. of Machine Learning (ICML'03), 2003. ,
Point-based Dynamic Programming for DEC-POMDPs, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00104443
MAA*: A heuristic search algorithm for solving decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in Artificial Intelligence (UAI'05), pp.576-583, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00000204
Interac-DEC-MDP : Towards the use of interactions in DEC-MDP, Proc. of the Third Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'04), pp.1450-1451, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00108104
Linear Programming: Foundations and Extensions, 2008. ,
DOI : 10.1057/palgrave.jors.2600987
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.111.1824
Handbook of Game Theory Computing equilibria for two-person games, pp.1723-1759, 2002. ,
Mixed-integer linear programming for transitionindependent decentralized MDPs, Proc. of the fifth Int. Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS'06), pp.1058-1060, 2006. ,
Communication in multi-agent Markov decision processes, Proc. of ICMAS Workshop on Game Theoretic and Decision Theoretics Agents, 2000. ,
Cooperation in stochastic games through communication, Proc. of the fourth Int. Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'05), 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00000208
Planning, learning and coordination in multiagent decision processes The Netherlands, Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK '96), De Zeeuwse Stromen, 1996. ,
Multiagent learning using a variable learning rate, Artificial Intelligence, vol.136, issue.2, pp.215-250, 2002. ,
DOI : 10.1016/S0004-3702(02)00121-2
Incremental pruning : A simple, fast, exact method for partially observable markov decision processes, Proc. of the Conf. on Uncertainty in Artificial Intelligence (UAI), 1997. ,
The dynamics of reinforcement learning in cooperative multiagent systems, pp.746-752, 1998. ,
Correlated Q-learning, Proc. of the 20th Int. Conf. on Machine Learning (ICML), 2003. ,
Dynamic programming for partially observable stochastic games, Proc. of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004. ,
Multiagent reinforcement learning : theoretical framework and an algorithm, Proceedings of the Fifteenth International Conference on Machine Learning, pp.98-242, 1998. ,
Nash Q-learning for general-sum stochastic games, Journal of Machine Learning Research, 2003. ,
A generalized reinforcement-learning model : Convergence and applications, Proc. of the Thirteenth Int. Conf. on Machine Learning (ICML'96), 1996. ,