R. B. Levinson and . M. Sondhi, « An introduciton to the application of the theory of probabilistic functions of a markov process to automatic speech recognition, The Bell System Technical Journal, 1983.

. Lin and . L. Lin, « Self-improving reactive agent based on reinforcement learning, planing and teaching, Machine Learning, pp.293-321, 1992.
DOI : 10.1007/978-1-4615-3618-5_5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.7884

M. A. , A. C. , H. S. , and G. C. Cowan, « Memory based reinforcement efficient computation by prioritized sweeping, Advances in neural information processing systems, 1993.

. A. Stolcke and . Omohundro-s, Hidden Markov model induction by bayesian model merging. », Advances in neural information processing systems, 1992.

. R. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988.
DOI : 10.1007/BF00115009

«. R. Sutton and M. J. Lippman-r, Integrated modeling and control based on reinforcement learning and dynamic programming, Advances in neural information processing systems, 1991.

. S. Thrun, « Efficient exploration in reinforcement learning. », rapport n o CMU- CS-92-102, 1992.

. Watkins-c, Learning from delayed rewards. », PhD thesis, King's College of Cambridge, 1989.

C. Amato, D. S. Bernstein, and S. Zilberstein, Optimizing memory-bounded controllers for decentralized POMDPs, Proc. of the Twenty-Third Conf. on Uncertainty in Artificial Intelligence (UAI-07), 2007.
DOI : 10.1007/s10458-009-9103-z
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.8934

C. Amato, D. S. Bernstein, and S. Zilberstein, Solving POMDPs using quadratically constrained linear programs, Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , AAMAS '06, 2007.
DOI : 10.1145/1160633.1160694
URL : http://anytime.cs.umass.edu/aimath06/proceedings/P56.pdf

C. Amato, A. Carlin, and S. Zilberstein, Bounded dynamic programming for decentralized POMDPs, Proc. of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM) in AAMAS'07, 2007.

C. Amato, J. Dibangoye, and S. Zilberstein, Incremental policy generation for finitehorizon DEC-POMDPs, Proc. of the Nineteenth Int. Conf. on Automated Planning and Scheduling (ICAPS-09), 2009.

B. Anderson and J. Moore, Time-varying feedback laws for decentralized control, Nineteenth IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, pp.519-524, 1980.
DOI : 10.1109/TAC.1981.1102770

R. Becker, S. Zilberstein, V. Lesser, and C. Goldman, Solving transition independent decentralized Markov decision processes, Journal of Artificial Intelligence Research, vol.22, pp.423-455, 2004.
DOI : 10.1145/860575.860583
URL : http://anytime.cs.umass.edu/shlomo/papers/aamas03a.pdf

R. Bellman, Dynamic programming, 1957.

D. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, pp.819-840, 2002.
DOI : 10.1287/moor.27.4.819.297

D. S. Bernstein, E. A. Hansen, and S. Zilberstein, Bounded policy iteration for decentralized POMDPs, Proc. of the Nineteenth Int. Joint Conf. on Artificial Intelligence (IJCAI), pp.1287-1292, 2005.

A. Boularias and B. Chaib-draa, Exact dynamic programming for decentralized pomdps with lossless policy compression, Proc. of the Int. Conf. on Automated Planning and Scheduling (ICAPS'08), 2008.

C. Boutilier, Planning, learning and coordination in multiagent decision processes, Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK '96), De Zeeuwse Stromen, 1996.

O. Buffet, A. Dutech, and F. Charpillet, Shaping multi-agent systems with gradient reinforcement learning, Autonomous Agents and Multi-Agent Systems, vol.33, issue.1, pp.197-220, 2007.
DOI : 10.1007/s10458-006-9010-5
URL : https://hal.archives-ouvertes.fr/inria-00118983

A. Cassandra, L. Kaelbling, and M. Littman, Acting optimally in partially observable stochastic domains, Proc. of the 12th Nat. Conf. on Artificial Intelligence (AAAI), 1994.

I. Chadès, B. Scherrer, and F. Charpillet, A heuristic approach for solving decentralized-POMDP, Proceedings of the 2002 ACM symposium on Applied computing , SAC '02, pp.57-62, 2002.
DOI : 10.1145/508791.508804

G. Cornuéjols, Valid inequalities for mixed integer linear programs, Mathematical Programming, vol.30, issue.1, pp.3-44, 2008.
DOI : 10.1007/s10107-006-0086-0

G. B. Dantzig, On the Significance of Solving Linear Programming Problems with Some Integer Variables, Econometrica, vol.28, issue.1, pp.30-44, 1960.
DOI : 10.2307/1905292

F. Epenoux, A Probabilistic Production and Inventory Problem, Management Science, vol.10, issue.1, pp.98-108, 1963.
DOI : 10.1287/mnsc.10.1.98

U. Diwekar, Introduction to Applied Optimization, 2008.

R. Drenick, Multilinear programming: Duality theories, Journal of Optimization Theory and Applications, vol.3, issue.3, pp.459-486, 1992.
DOI : 10.1007/BF00939837

R. Fletcher, Practical Methods of Optimization, 1987.
DOI : 10.1002/9781118723203

M. Ghavamzadeh and S. Mahadevan, Learning to communicate and act in cooperative multiagent systems using hierarchical reinforcement learning, Proc. of the 3rd Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'04), 2004.

S. Govindan and R. Wilson, A global Newton method to compute Nash equilibria, Journal of Economic Theory, vol.110, issue.1, pp.65-86, 2001.
DOI : 10.1016/S0022-0531(03)00005-X

E. Hansen, D. Bernstein, and S. Zilberstein, Dynamic programming for partially observable stochastic games, Proc. of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004.

R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, 2003.

M. James and S. Singh, Learning and discovery of predictive state representations in dynamical systems with reset, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015359

L. Kaelbling, M. Littman, and A. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998.
DOI : 10.1016/S0004-3702(98)00023-X

D. Koller, N. Megiddo, and B. Stengel, Fast algorithms for finding randomized strategies in game trees, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , STOC '94, pp.750-759, 1994.
DOI : 10.1145/195058.195451

D. Koller and N. Megiddo, Finding mixed strategies with small supports in extensive form games, International Journal of Game Theory, vol.18, issue.1, pp.73-92, 1996.
DOI : 10.1007/BF01254386

C. Lemke, Bimatrix Equilibrium Points and Mathematical Programming, Management Science, vol.11, issue.7, pp.681-689, 1965.
DOI : 10.1287/mnsc.11.7.681

D. Luenberger, Linear and Nonlinear Programming, 1984.
DOI : 10.1007/978-3-319-18842-3

P. Mccracken and M. H. Bowling, Online discovery and learning of predictive state representations, Advances in Neural Information Processing Systems 18 (NIPS'05), 2005.

R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella, Taming decentralized POMDPs: towards efficient policy computation for multiagent setting, Proc. of Int. Joint Conference on Artificial Intelligence, IJCAI'03, 2003.

F. Oliehoek, M. Spaan, and N. Vlassis, Optimal and approximate Q-value functions for decentralized POMDPs, Journal of Artificial Intelligence Research, vol.32, pp.289-353, 2008.

F. Oliehoek, S. Whiteson, and M. Spaan, Lossless clustering of histories in decentralized POMDPs, Proc. of The International Joint Conference on Autonomous Agents and Multi Agent Systems, pp.577-584, 2009.

M. J. Osborne and A. Rubinstein, A Course in Game Theory, 1994.

C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, 1982.

C. H. Papadimitriou and J. Tsitsiklis, The Complexity of Markov Decision Processes, Mathematics of Operations Research, vol.12, issue.3, pp.441-450, 1987.
DOI : 10.1287/moor.12.3.441

S. Parsons and M. Wooldridge, Game theory and decision theory in multi-agent systems, Autonomous Agents and Multi-Agent Systems, vol.5, issue.3, pp.243-254, 2002.
DOI : 10.1023/A:1015575522401

M. Petrik and S. Zilberstein, Average-reward decentralized Markov decision processes, Proc. of the Twentieth Int. Joint Conf. on Artificial Intelligence, 2007.

M. Petrik and S. Zilberstein, A bilinear programming approach for multiagent planning, Journal of Artificial Intelligence Research, vol.35, pp.235-274, 2009.

M. Puterman, Markov Decision Processes: discrete stochastic dynamic programming, 1994.
DOI : 10.1002/9780470316887

D. Pynadath and M. Tambe, The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories And Models, Journal of Artificial Intelligence Research, vol.16, pp.389-423, 2002.

R. Radner, The Application of Linear Programming to Team Decision Problems, Management Science, vol.5, issue.2, pp.143-150, 1959.
DOI : 10.1287/mnsc.5.2.143

S. Russell and P. Norvig, Artificial Intelligence: A modern approach, p.395, 1995.

T. Sandholm, Multiagent systems, chap. Distributed rational decision making, pp.201-258, 1999.

T. Sandholm, A. Gilpin, and V. Conitzer, Mixed-integer programming methods for finding nash equilibria, Proc. of the National Conference on Artificial Intelligence (AAAI), 2005.

B. Scherrer and F. Charpillet, Cooperative co-learning: a model-based approach for solving multi-agent reinforcement problems, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings., 2002.
DOI : 10.1109/TAI.2002.1180839
URL : https://hal.archives-ouvertes.fr/inria-00100814

S. Seuken and S. Zilberstein, Memory-bounded dynamic programming for DEC- POMDPs, Proc. of the Twentieth Int. Joint Conf. on Artificial Intelligence (IJ- CAI'07), 2007.

S. Singh, T. Jaakkola, and M. Jordan, Learning Without State-Estimation in Partially Observable Markovian Decision Processes, Proceedings of the Eleventh International Conference on Machine Learning, 1994.
DOI : 10.1016/B978-1-55860-335-6.50042-8

S. Singh, M. Littman, N. Jong, D. Pardoe, and P. Stone, Learning predictive state representations, Proc. of the Twentieth Int. Conf. of Machine Learning (ICML'03), 2003.

D. Szer and F. Charpillet, Point-based Dynamic Programming for DEC-POMDPs, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00104443

D. Szer, F. Charpillet, and S. Zilberstein, MAA*: A heuristic search algorithm for solving decentralized POMDPs, Proc. of the Twenty-First Conf. on Uncertainty in Artificial Intelligence (UAI'05), pp.576-583, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000204

V. Thomas, C. Bourjot, and V. Chevrier, Interac-DEC-MDP : Towards the use of interactions in DEC-MDP, Proc. of the Third Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'04), pp.1450-1451, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00108104

R. J. Vanderbei, Linear Programming: Foundations and Extensions, 2008.
DOI : 10.1057/palgrave.jors.2600987
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.111.1824

B. Von-stengel, Handbook of Game Theory Computing equilibria for two-person games, pp.1723-1759, 2002.

J. Wu and E. H. Durfee, Mixed-integer linear programming for transitionindependent decentralized MDPs, Proc. of the fifth Int. Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS'06), pp.1058-1060, 2006.

P. Xuan, V. Lesser, and S. Zilberstein, Communication in multi-agent Markov decision processes, Proc. of ICMAS Workshop on Game Theoretic and Decision Theoretics Agents, 2000.

A. , R. Dutech, A. And-charpillet, and F. , Cooperation in stochastic games through communication, Proc. of the fourth Int. Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS'05), 2005.
URL : https://hal.archives-ouvertes.fr/inria-00000208

C. Boutilier, Planning, learning and coordination in multiagent decision processes The Netherlands, Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK '96), De Zeeuwse Stromen, 1996.

M. Bowling, . And, and M. Veloso, Multiagent learning using a variable learning rate, Artificial Intelligence, vol.136, issue.2, pp.215-250, 2002.
DOI : 10.1016/S0004-3702(02)00121-2

C. , A. Littman, M. And-zhang, and N. , Incremental pruning : A simple, fast, exact method for partially observable markov decision processes, Proc. of the Conf. on Uncertainty in Artificial Intelligence (UAI), 1997.

C. , C. And-boutilier, and C. , The dynamics of reinforcement learning in cooperative multiagent systems, pp.746-752, 1998.

G. , A. And-hall, and K. , Correlated Q-learning, Proc. of the 20th Int. Conf. on Machine Learning (ICML), 2003.

H. , E. Bernstein, D. And-zilberstein, and S. , Dynamic programming for partially observable stochastic games, Proc. of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004.

H. , J. And, and M. Wellman, Multiagent reinforcement learning : theoretical framework and an algorithm, Proceedings of the Fifteenth International Conference on Machine Learning, pp.98-242, 1998.

H. , J. And, and M. Wellman, Nash Q-learning for general-sum stochastic games, Journal of Machine Learning Research, 2003.

L. , M. And-szepesvári, and C. , A generalized reinforcement-learning model : Convergence and applications, Proc. of the Thirteenth Int. Conf. on Machine Learning (ICML'96), 1996.