. Abbeelp and . Y. Nga, Apprenticeshiplearningviainversereinforcementlearning, Proceedings of the twenty-first international conference on Machine learning, 2004.

J. S. Albus, A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC), Journal of Dynamic Systems, Measurement, and Control, vol.97, issue.3, pp.97220-227, 1975.
DOI : 10.1115/1.3426922

T. Andou, Andhill-98: A RoboCup Team which Reinforces Positioning with Observation, pp.338-345, 1998.
DOI : 10.1007/3-540-48422-1_27

C. G. Atkeson and S. Schaal, Robot learning from demonstration, Proc. 14th InternationalConferenceonMachineLearning, pp.12-20, 1997.

P. Bakker and Y. Kuniyoshi, Robot see, robot do: An overview of robot imitation, AISB96WorkshoponLearninginRobotsandAnimals, 1996.

R. E. Bellman, Dynamic Programming, 1957.

. C. Bentivegnad and . G. Atkesonc, LearningHowtoBehavefromObservingOthers, 2002.

D. C. Bentivegna, C. G. Atkeson, and G. Cheng, Humanoid robot learning and game playing using PC-based vision, IEEE/RSJ International Conference on Intelligent Robots and System, pp.2449-2454, 2002.
DOI : 10.1109/IRDS.2002.1041635
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.8175

D. C. Bentivegna, A. Ude, and C. G. Atkeson, Humanoid robot learning and game playing using PC-based vision, IEEE/RSJ International Conference on Intelligent Robots and System, 2002.
DOI : 10.1109/IRDS.2002.1041635
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.8175

D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation of Experiments, 1985.
DOI : 10.1007/978-94-015-3711-7

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, p.512, 1996.

M. Bhaskara, Automatic shaping and decomposition of reward functions, In ICML, vol.227, pp.601-608, 2007.

C. Boutilier, Sequential Optimality and Coordination in Multiagent Systems, 1999.

J. A. Boyan and A. W. Moore, Generalization in reinforcement learning: Safely approximatingthevaluefunction, AdvancesinNeuralInformationProcessingSystems, vol.7, pp.369-376, 1995.

R. I. Brafman and M. Tennenholtz, R-max -a general polynomial time algorithm for near-optimalreinforcementlearning, J.Mach.Learn.Res, vol.3, pp.213-231, 2003.

A. J. Champandard, An Overview of the AI in Football Games from Cheating to Machine Learning, 2008.

C. Claus, C. Boutilier, . Colombettim, M. Dorigo, and G. Borghi, The Dynamics of Reinforcement Learning in Cooperative MultiagentSystemsAAAI-97WorkshoponMultiagentLearningRobotshaping:TheHAMSTERExperiment, 1996.

A. Cornuéjols and L. Miclet, Apprentissage artificiel Concepts et algorithmes, Editions Eyrolles, p.620, 2002.

R. H. Crites and A. G. Barto, Improving elevator performance using reinforcement learning, AdvancesinNeuralInformationProcessingSystems, vol.8, pp.1017-1023, 1996.

. Dahlf, JellyFishBackgammon, 1998.

. Degristphdthesis, . Universitépierreetmariecurie, . Paris, and T. G. Dietterich, ApprentissageparrenforcementdanslesprocessusdedécisionMarkoviens factorisés The MAXQ method for hierarchical reinforcement learning, pp.118-126, 1998.

M. Dorigo and M. Colombetti, Robot shaping: developing autonomous agents through learning, Artificial Intelligence, vol.71, issue.2, pp.321-370, 1994.
DOI : 10.1016/0004-3702(94)90047-7

. Dorigom and . Colombettim, RobotShaping:AnExperimentinBehaviorEngineering. MITPress, p.300, 1997.

E. Dybsand, N. Kirby, and S. Woodcock, AI in Computer Games: AI for Beginners Discussion?RoundtableHandouts, Probabilisticpolicyreuseinareinforcementlearningagent. the fifth international joint conference on Autonomous agents and multiagent systems, 2001.

. Filliatd, RobotiqueMobile, 2004.

O. Gies and B. Chaib-draa, Apprentissage de la coordination multiagent: Une méthode basée sur le Q-learning par jeu adaptatif. Revue d'intelligence artificielle, pp.2-3, 2006.
DOI : 10.3166/ria.20.383-410

. A. Hessr and . Modjtahedzadeha, Apreviewcontrolmodelofdriversteeringbehavior, pp.504-509, 1989.

K. Jun, M. Sung, and B. Choi, Steering Behavior Model of Visitor NPCs in Virtual ExhibitionIn:AdvancesinArtificialRealityandTele-Existence, HeidelbergS.B.ed, vol.4282, pp.113-121, 2006.

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, Planning and acting in partially observablestochasticdomains, ArtificialIntelligence, vol.101, pp.99-134, 1998.

M. Kearns and S. Singh, Finite-sample convergence rates for Q-learning and indirect algorithms, Proceedings of the 1998 conference on Advances in neural information processingsystemsII, pp.996-1002, 1999.

H. Kitano, M. Asada, and Y. Kuniyoshi, RoboCup, Proceedings of the first international conference on Autonomous agents , AGENTS '97, pp.19-24, 1995.
DOI : 10.1145/267658.267738

. Konidarisg and . G. Bartoa, Autonomousshaping:knowledgetransferinreinforcement learning, InICML, vol.148, pp.489-496, 2006.

. Liuy and . Stonep, Value-Function-Based TransferforReinforcement LearningUsing Structure Mapping, Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp.415-435, 2006.

S. Mahadevan, Grid World, 1997.

M. J. Mataric, Reward Functions for Accelerated Learning, Proceedings of the EleventhInternationalConferenceonMachineLearning, pp.181-189, 1994.
DOI : 10.1016/B978-1-55860-335-6.50030-1

. J. Mataricm, ReinforcementLearningintheMulti-RobotDomain, Autonomous Robots, vol.4, issue.1, pp.73-83, 1997.
DOI : 10.1023/A:1008819414322

. Mavromatiss, . Baratginj, and . Sequeiraj, Analyzingteamsportstrategiesbymeansof graphicalsimulationTowardthedesignofasimulatortoanalyze teamsportstrategies, ICISP2003,InternationalConferenceonImageandSignalProcessing, Agadir(Maroc),Juin2003, 2003.

D. Michie and R. A. Chambers, BOXES: An experiment in adaptive control, Machine Intelligence2Mappingbetweendissimilarbodies:Affordancesandthe algebraicfoundationsofimitation, E.Eds.Edinburgh:OliverandBoyd NehanivC.,DautenhahnK, pp.137-152, 1968.

A. Y. Ng and S. J. Russell, Algorithms for Inverse Reinforcement Learning, ProceedingsoftheSeventeenthInternationalConferenceonMachineLearning, pp.663-670, 2000.

. Niej and . Haykins, AdynamicchannelassignmentpolicythroughQ-learning, Neural NetworksIEEETransactionson, vol.10, issue.6, pp.1443-1455, 1999.

J. Nie and S. Haykin, A Q-learning-based dynamic channel assignment technique for mobilecommunication systems, Vehicular Technology IEEE Transactions on, vol.48, issue.5, pp.1676-1687, 1999.

G. B. Peterson, A day of great illumination: B. F. Skinner's discovery of shaping., Journal of the Experimental Analysis of Behavior, vol.82, issue.3, 2004.
DOI : 10.1901/jeab.2004.82-317

. Priceb and . Boutilierc, Implicitimitationinmultiagent reinforcementlearning, Proc. 16thInternationalConf.onMachineLearning, pp.325-334, 1999.

B. Price and C. Boutilier, Imitation and Reinforcement learning in agents with heterogeneousactions, ProceedingsoftheSeventeenthInternationalConferenceonMachine Learning(ICML-2000), p.pp, 2000.

B. Price and C. Boutilier, Accelerating Reinforcement Learning through Implicit Imitation, JournalofArtificialIntelligenceResearch(JAIR, vol.19, pp.569-629, 2003.

. Randløvj and . Alstrømp, Learning to Drive a Bicycle Using Reinforcement Learning andShaping.InICMLSoftwareforRLinC++, MorganKaufmann. RatitchB, pp.463-471, 1998.

S. J. Russell and P. Norvig, Artificial intelligence : a modern approach, p.932, 1995.

J. C. Santamaria and R. S. Sutton, A Standard Interface for Reinforcement Learning Software, 1996.

O. G. Selfridge, R. S. Sutton, and A. G. Barto, Training and Tracking in Robotics, IJCAI, pp.670-672, 1985.

S. Singh and D. Bertsekas, Reinforcement learning for dynamic channel allocation in cellulartelephonesystems, AdvancesinNeuralInformationProcessingSystems, vol.9, pp.974-980, 1997.

. F. Skinnerb, TheBehaviorofOrganisms:AnExperimentalAnalysis.p, 1938.

E. J. Sondik, . Stanforduniversity, P. Stone, and D. Mcallester, The Optimal Control of Partially Observable Markov Processes An architecture for action selection in robotic soccer, 1971.

. Stonep and . S. Suttonr, KeepawaySoccer:AMachineLearningTestbed, pp.214-223, 2001.

. Stonep and . M. Velosom, Team-Partitioned,Opaque-TransitionReinforced Learning, 1998.

R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparsecoarsecoding, AdvancesinNeuralInformationProcessingSystems, vol.8, pp.1038-1044, 1996.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, p.322, 1998.
DOI : 10.1109/TNN.1998.712192

. S. Suttonr, . Precupd, and . Singhs, BetweenMDPsandsemi-MDPs:aframeworkfor temporalabstractioninreinforcementlearning, ArtificialIntelligence, vol.112, pp.1-2181, 1999.

. Taylorm, . Whitesons, and . Stonep, ComparingEvolutionaryandTemporalDifference Methods for Reinforcement Learning, Proceedings of the Genetic and Evolutionary ComputationConference, pp.1321-1349, 2006.

M. E. Taylor, G. Kuhlmann, and P. Stone, Autonomous transfer for reinforcement learning. InternationalFoundationforAutonomousAgents andMultiagentSystems, p. (Proceedings of the 7th international joint conference on Autonomous agentsandmultiagentsystems), pp.283-290, 2008.

M. E. Taylor and P. Stone, Behavior transfer for value-function-based reinforcement learning, Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems , AAMAS '05, 2005.
DOI : 10.1145/1082473.1082482

M. E. Taylor, P. Stone, and Y. Liu, Transfer Learning via Inter-Task Mappings for TemporalDifferenceLearning, J.Mach.Learn.Res, vol.8, pp.2125-2167, 2007.

. E. Taylorm, . Whitesons, and . Stonep, Transferviainter-taskmappingsinpolicysearch reinforcement learning, Proceedings of the 6th international joint conference on Autonomousagentsandmultiagentsystems, 2007.

G. Tesauro, Neurogammon: a neural-network backgammon program, 1990 IJCNN International Joint Conference on Neural Networks, pp.33-39, 1990.
DOI : 10.1109/IJCNN.1990.137821

G. Tesauro, Programming backgammon using self-teaching neural nets, Artificial Intelligence, vol.134, issue.1-2, pp.181-199, 2002.
DOI : 10.1016/S0004-3702(01)00110-2
URL : http://doi.org/10.1016/s0004-3702(01)00110-2

J. C. Todd, J. L. Carroll, and T. S. Peterson, Memory-guided Exploration in Reinforcement Learning, INNS-IEEE International Joint Conference on Neural Networks, pp.1002-1007, 2001.

. Vanroyb, . P. Bertsekasd, and . Leey, Aneuro-dynamicprogrammingapproach toretailerinventorymanagement.DecisionandControlAnticipationasakeyforcollaborationinateamof agents: A case study in robotic soccer. SPIE Sensor Fusion and Decentralized Control in RoboticSystemsII, Proceedingsofthe36thIEEE Conferenceon, p.1999, 1997.

. Watkinsc and . Dayanp, Q-learning.MachineLearning, pp.279-292, 1992.

. J. Watkinsc and S. D. Whitehead, ModelsofDelayedReinforcementLearning Reinforcement learning for the adaptive control of perception and actionGameAI:TheStateoftheIndustry, 1989.

T. Yu, page mise à jour le Behavior Simulation for Autonomous Agents in Crowded Environment

W. Zhang and T. G. Dietterich, A reinforcement learning approach to Job-shop Scheduling, Proceedings of the International Joint Conference on Artificial Intellience (IJCAI-95), pp.1114-1120, 1995.