. De-cette-manière, inciter un agent à eectuer des actions dans le long terme : Un agent A en modiant la mémoire d'un agent B modie l'état interne de B et les actions que B pourra déclencher par la suite. On peut très bien imaginer par exemple que par apprentissage , B apprenne à exécuter dans le futur une certaine action parce que A lui a appris à coupler les récompenses sociales qu'il recevra avec les actions qu'il peut faire

. Dans-certains-cas, un apprentissage permet de synchroniser ces politiques et de faire émerger [BC01] C. Bourjot and V. Chevrier. Multi-agent simulation in biology : application to social spiders case Agent Based Simulation Workshop II, p.1823, 2001.

V. [. Bourjot, V. Chevrier, and . Thomas, A new swarm mechanism based on social spiders colonies : from web weaving to region detection. Web Intelligence and Agent Systems, An International Journal -WIAS, vol.1, issue.1, p.4764, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00099635

[. Ber, A. Dury, and V. Chevrier, Un modèle multi-agents pour la simulation en agronomie : usages et comparaisons, 1998.

E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence : From Natural to Articial Systems, 1999.

D. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research, vol.27, issue.4, p.819840, 2002.
DOI : 10.1287/moor.27.4.819.297

C. Boutilier, Sequential optimality and coordination in multiagent systems Bisognin and S. Pesty. Emotions et systèmes multi-agents : une architecture d'agents émotionnels, IJCAI '99 : Proceedings of the Sixteenth International Joint Conference on Articial Intelligence, pp.478485-307320, 1999.

R. A. Brooks, Intelligence without representation, Artificial Intelligence, vol.47, issue.1-3, pp.139-159, 1991.
DOI : 10.1016/0004-3702(91)90053-M
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.1680

E. Bonabeau, G. [. Theraulaz, G. Bonabeau, J. L. Theraulaz, and . Deneubourg, Mathematical model of self-organizing hierarchies in animal societies, Bulletin of Mathematical Biology, vol.33, issue.4, pp.661-717, 1996.
DOI : 10.1007/BF02459478

O. Buet, Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs, 2003.

S. [. Becker, V. Zilberstein, and . Lesser, Decentralized markov decision processes with event-driven interactions, AAMAS '04 : Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004.

R. Becker, S. Zilberstein, V. Lesser, and C. Goldman, Transition-independent decentralized markov decision processes, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, p.4148, 2003.
DOI : 10.1145/860575.860583

[. Claus and C. Boutilier, The dynamics of reinforcement learning in cooperative multiagent systems, AAAI/IAAI, p.746752, 1998.

. S. Ccf-+-99-]-r, Y. Cost, T. Chen, Y. Finin, Y. Labrou et al., Modeling agent conversations with colored petri nets, Proc of the Workshop on Specifying and Implementing Conversation Policies, pp.5966-99

N. S. Camazine, J. Franks, E. Sneyd, J. L. Bonabeau, G. Deneubourg et al., Self-Organization in Biological Systems, 2001.

I. Chades, Planication distribuée dans les systèmes multi-agents à l'aide de processus décisionnels de Markov, 2002.

A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, Acting under uncertainty: discrete Bayesian models for mobile-robot navigation, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96, 1996.
DOI : 10.1109/IROS.1996.571080

L. [. Cassandra, M. L. Kaelbling, and . Littman, Acting optimally in partially observable stochastic domains, Proceedings of the Twelfth National Conference on Articial Intelligence (AAAI-94), p.10231028, 1994.

M. C. Cotel, V. Thomas, C. Bourjot, D. Desor, V. Chevrier et al., Processus cognitifs et dierenciation sociale de groupes de rats : interet de la modelisation multi-agents. 6e colloque des jeunes chercheurs en sciences cognitives, Mai The ant colony optimization meta-heuristic, New Ideas in Optimization, p.1132, 1999.

Y. Demazeau, Steps towards multi-agent oriented programming, 1st International Workshop on Multi-Agent Systems(IWMAS'97, 1997.

]. A. Df93a, J. Drogoul, and . Ferber, From tom-thumb to the dockers : Some experiments with foraging robots. From Animals to Animats II, p.451459, 1993.

]. A. Df93b, J. Drogoul, and . Ferber, From tom-thumb to the dockers : Some experiments with foraging robots. From Animals to Animats II, 1993.

J. L. Deneubourg and S. Goss, Collective patterns and decision-making, Ethology Ecology & Evolution, vol.3, issue.4, p.295311, 1989.
DOI : 10.1111/j.1474-919X.1973.tb01990.x

]. A. Diz03 and . St-dizier, Les processus décentralisés d'organisation et leurs applications potentielles, Diplome de DEa, universite Henri Poincare, 2003.

M. Inverno, D. Kinny, and M. Luck, Interaction protocols in agentis. IC- MAS'98, Third International Conference on Multi-Agent Systems, p.112119, 1998.

D. Desor, B. Krat, A. M. Toniolo, and P. Dicked, Social cognition in rats : incentive behaviour related to food supply, 1991.

E. H. Durfee, V. R. Lesser, and D. D. Corkill, Trends in cooperative distributed problem solving, IEEE Transactions on Knowledge and Data Engineering, vol.1, issue.1, p.6383, 1989.
DOI : 10.1109/69.43404

M. Dorigo, V. Maniezzo, and A. Colorni, Ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.26, issue.1, p.2941, 1996.
DOI : 10.1109/3477.484436

M. [. Dutech, ]. Samuelidesdt92, A. M. Desor, and . Toniolo, Apprentissage par renforcement pour les processus décisionnels de markov partiellement observés. Revue d'Intelligence Articielle Incentive behaviour in structure groups of rats : about the possible occurence of socio-cognitive processes, Comparative approach in sciences cognitives, 1992.

A. Dury, A. Dury, G. Vakanas, C. Bourjot, V. Chevrier et al., Modélisation des interactions dans les systèmes multi-agents Using multi-agent system to model prey capture in social spiders, ESS01 13th European Simulation Symposium, p.831833, 2000.

J. Ferber, Les systèmes multi-agents. Vers une intelligence collective. InterEditions, 1995.

J. Ferber, Les systèmes multi-agents : un apercu general. Technique et science informatique, p.9791012, 1997.

T. Finin, R. Fritzson, D. Mckay, and R. Mcentire, KQML as an agent communication language, Proceedings of the third international conference on Information and knowledge management , CIKM '94, p.456463, 1994.
DOI : 10.1145/191246.191322

E. [. Ferber, . [. Jacopin, J. P. Ferber, and . Müller, The framework of eco-problem solving Inuences and reaction : A model of situated multiagent systems, Decentralized A.I. 2 : Proc. of the 2nd European Workshop on Modelling Autonomous Agents in a Multi-Agent World International Conference on Multi-Agent Systems, p.181193, 1991.

M. [. Goldman, S. Allen, and . Zilberstein, Decentralized language learning through acting, Proc. 3rd Intl. Joint Conf. on Autonomous Agents and Multi Agent Systems, p.10061013, 2004.

J. [. Gutknecht and . Ferber, Un meta-modèle organisationnel pour l'analyse, la conception et l'exécution de systèmes multi-agents. actes de journées francophones sur les systèmes multi-agents 98, 1998.

C. Guestrin, D. Koller, and R. Parr, Multiagent planning with factored MDPs, Advances in Neural Information Processing Systems, pp.1523-1530, 2001.

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman, Ecient solution algorithms for factored mdps, Journal of Articial Intelligence Research (JAIR), vol.19, p.399468, 2003.

C. Guestrin, M. Lagoudakis, R. P. Parrgra59-]-p, and . Grasse, Coordinated reinforcement learning La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis et cubitermes sp., la théorie de la stigmergie : essais d'interprétation du comportement des termites constructeurs, Nineteenth International Conference on Machine Learning, pp.227-2344184, 1959.

C. Guestrin, Planning Under Uncertainty in Complex Structured Environments, 2003.

S. [. Goldman and . Zilberstein, Optimizing information exchange in cooperative multi-agent systems, Proceedings of the second international joint conference on Autonomous agents and multiagent systems , AAMAS '03, 2003.
DOI : 10.1145/860575.860598

S. [. Goldman and . Zilberstein, Decentralized control of cooperative systems : Categorization and complexity analysis, J. Artif. Intell. Res. (JAIR), vol.22, p.143174, 2004.

G. Hardin, The Tragedy of the Commons???, Journal of Natural Resources Policy Research, vol.10, issue.3, p.12431248, 1968.
DOI : 10.1002/bs.3830010402

J. [. Horvitz, M. Breese, and . Henrion, Decision theory in expert systems and articial intelligence, Journal of Approximate Reasoning,Special Issue on Uncertainty in Articial Intelligence, p.247302, 1988.

D. [. Hansen, S. Bernstein, and . Zilberstein, Dynamic programming for partially observable stochastic games, AAAI, p.709715, 2004.

]. C. Hem96 and . Hemelrijk, Dominance interactions, spatial dynamics and emergent reciprocity in a virtual world. fourth international conference on simulation of adaptive behavior, p.545552, 1996.

]. C. Hem99 and . Hemelrijk, An individual-oriented model on the emergence of despotic and egalitarian societies, Proceedings of the Royal Society London B : Biological Sciences, p.361369, 1999.

]. C. Hem00 and . Hemelrijk, Towards the integration of social dominance and spatial structure, Animal Behaviour, p.10351048, 2000.

M. R. Jean, Emergence et sma, JFSMA97, p.323342, 1997.

R. C. Jost, J. Jeanson, C. Denebourg, G. Rivault, and . Theraulaz, Self-organized aggregation in cockroaches : sensitivity to model structure, International Workshop on Self-Organization and Evolution of Social Behaviour, 2002.

]. N. Joh02 and . Johnson, the development of collective structure and its response to environmental change. International workshop on self-organization and evolution of social behaviour, p.215237, 2002.

J. Jozefowiez, conditionnement opérant et problèmes décisionnels de markov, partie : Reinforcement learning and conditionning : an overview, 2001.

K. [. Jennings, M. Sycara, and . Wooldridge, A roadmap of agent research and development, Journal of Autonomous Agents and Multi-Agent Systems, vol.1, issue.1, p.738, 1998.

O. P. Judson, The rise of the individual-based model in ecology, Trends in Ecology & Evolution, vol.9, issue.1, p.372377, 1994.
DOI : 10.1016/0169-5347(94)90225-9

J. R. Krebs and N. B. Davies, An introduction to behavioral ecology Oxford : Blackwell Science, 1993.

[. Keil and D. Goldin, Modeling indirect interaction in open computational systems . 1st Int'l workshop on Theory and Practice of Open Computational systems, 2003.

B. [. Lumer and . Faieta, Diversity and adaptation in populations of clustering ants. From Animals to Animats 3, p.501508, 1994.

]. M. Lit94a and . Littman, Markov games as a framework for multi-agent reinforcement learning

]. M. Lit94b and . Littman, Memoryless policies : Theoretical limitations and practical results, Simulation of Adaptive Behaviour (SAB-94), p.238245, 1994.

]. M. Lit94c and . Littman, The witness algorithm : Solving partially observable markov decision processes, 1994.

C. [. De-loor, P. Septseault, and . Chevaillier, Les émotions : une métaphore pour la résolution de problèmes dynamiques distribués. Déploiement des systèmes multiagents , vers un passage à l'échelle, JFSMA'2003 Revue des sciences et technologies de l'information, p.331344, 2003.

[. Maia, Rapport d'avant projet maia, 2002.

. Mhk-+-98-]-n, M. Meuleau, K. Hauskrecht, L. Kim, L. Peshkin et al., Solving very large weakly coupled markov decision processes, AAAI/IAAI, p.165172, 1998.

M. [. Mathieu and . Verrons, Ants : an api for creating negotiation applications, Proceedings of the 10th ISPE International Conference on Concurrent Engineering : Research and Applications (CE2003), 2003.
URL : https://hal.archives-ouvertes.fr/hal-00731974

]. H. Par97 and . Van-parunak, Go to the ant : Engineering principles from natural agent systems, Annals of Operations Research, p.69101, 1997.

]. R. Par98 and . Parr, Flexible decomposition algorithms for weakly coupled Markov decision problems, Fourteenth Annual Conference on Uncertainty in Articial Intelligence (UAI-98), p.422430, 1998.

S. Picault, Modèles de comportements sociaux pour les collectivités de robots et d'agents, 2001.

M. [. Pynadath and . Tambe, The communicative multi-agent team decision problem : analyzing teamwork : theories and models, Journal of Articial Intelligence Research, vol.16, p.389423, 2002.

M. L. Puterman, Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

S. Papendick, J. Wellner, and W. Dilger, Society rst and then minds : Self organisation of a social symbol system by learning agents. International Workshop on Self organisation and evolution of social behaviour, p.323333, 2002.

Y. [. Ribeiro and . Demazeau, A dynamic interaction model for multi-agent systems, Proceedings of the 2d Iberoamerican Workshop on Distributed Articial Intelligence and Multi-Agent Systems, p.2736, 1998.

C. Reynolds, Flocks, herds, and schools : A distributed behavioral model. SIG- GRAPH '87, 1987.

A. S. Rao and M. P. George, BDI-agents : from theory to practice, Proceedings of the First Intl. Conference on Multiagent Systems, 1995.

S. Russel and P. Norvig, Articial Intelligence : a modern approach, 1995.

A. [. Rescorla and . Wagner, A theory of pavlovian conditioning : Variations in the eectiveness of reinforcement and nonreinforcement. Classical conditioning II : Current research and theory, p.6499, 1972.

A. [. Sutton and . Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

F. [. Szer, S. Charpillet, and . Zilberstein, Maa*, a heuristic search algorithm for solving decentralized decpomdps, 2005.

T. [. Shoham, R. Grenager, and . Powers, Multi-agent reinforcement learning : A critical survey, 2003.

C. Shalizi, Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata, 2001.

]. O. Sig03 and . Sigaud, Comportements adaptatifs pour les agents dans des environnements informatiques complexes, 2003.

O. Sigaud, Comportements adaptatifs pour des agents dans des environnements informatiques complexes. Habilitation à Diriger des Recherches de l'Université PARIS 6, 2004.

O. Simonin, Le modèle staisfaction-altruisme, 2001.

]. R. Smi88 and . Smith, The contract net protocol : High-level communication and control in a distributed problem solver, Readings in Distributed Articial Intelligence, p.357366, 1988.

H. Schroeder, A. M. Toniolo, A. Nehlig, and D. Desor, Long-term eects of early diazepam exposure on social dierentiation in adult male rats subjected to the divingfor-food situation, Behavioural Neurosciences, vol.112, p.12091217, 1998.

J. Schneider, W. Wong, A. Moore, and M. Riedmiller, Distributed value functions, Proceedings of the 16th International Conference on Machine Learning, p.371378, 1999.

V. Thomas, C. Bourjot, and V. Chevrier, Interac-dec-mdp : Towards the use of interactions in dec-mdp, Third International Joint Conference on Autonomous Agents and Multi-Agent Systems -AAMAS'04, p.14501451, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00108104

V. Thomas, C. Bourjot, and V. Chevrier, Un formalisme pour la construction automatique dínteractions dans les sma réactifs, Journées Francophones sur les Systèmes Multi-Agents -JFSMA 2004, 2004.

C. [. Thomas, V. Bourjot, and . Chevrier, Heuristique pour l'apprentissage automatique décentralisé d'interactions dans un système multi-agents réactif, p.6, 2006.

V. Thomas, C. Bourjot, V. Chevrier, and D. Desor, Mas and rats : Multi-agent simulation of social dierentiation in rats groups, International Workshop on Self- Organization and Evolution of Social Behaviour, 2002.

C. [. Thomas, V. Bourjot, D. Chevrier, and . Desor, Hamelin : A model for collective adaptation based on internal stimuli, From animal to animats 8 -Eighth International Conference on the Simulation of Adaptive Behaviour 2004 -SAB'04, p.425434, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00108103

C. [. Thomas, V. Bourjot, and . Chevrier, Un formalisme pour la construction automatique dínteractions dans les smas réactifs(version longue), 2005.

]. S. Thr92 and . Thrun, Ecient exploration in reinforcement learning, 1992.

K. Tumer and D. Wolpert, Collective intelligence and braess' paradox, Proceedings of the Seventeenth National Conference on Articial Intelligence and Twelfth Conference on Innovative Applications of Articial Intelligence, p.104109, 2000.

C. J. Watkins and P. Dayan, Technical note q-learning, Machine Learning, p.279292, 1992.

G. Weiss, Multiagent systems : a modern approach to distributed articial intelligence, 1999.

S. W. Wilson, The animat path to ai. From Animals to Animats, p.1521, 1991.

M. Wooldridge and N. R. Jennings, Intelligent agents: theory and practice, The Knowledge Engineering Review, vol.10, issue.02, p.115152, 1995.
DOI : 10.1017/S0269888900008122

S. [. Wang and . Mahadevan, Hierarchical optimization of policy-coupled semimarkov decision processes, ICML '99 : Proceedings of the Sixteenth International Conference on Machine Learning, p.464473, 1999.

M. Woolridge, Introduction to Multiagent Systems, 2001.

<. N. Dans-ces-jeux-]-un-tuple, S. , A. , R. , and T. , A n avec A i l'ensemble des actions possibles pour l'agent i R = R 1, Les jeux stochastiques constituent une extension des jeux de matrice et permettent de représenter des systèmes dynamiques : un jeu stochastique est déni comme [SGP03