M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang, Review of road traffic control strategies, Proceedings of the IEEE, vol.91, issue.12, pp.2043-2067, 2003.

X. Zheng and W. Recker, An adaptive control algorithm for traffic-actuated signals, Transportation Research Part C: Emerging Technologies, vol.30, pp.93-115, 2013.

K. Dresner and P. Stone, A multiagent approach to autonomous intersection management, Journal of artificial intelligence research, pp.591-656, 2008.

S. I. Guler, M. Menendez, and L. Meier, Using connected vehicle technology to improve the efficiency of intersections, Transportation Research Part C: Emerging Technologies, vol.46, pp.121-131, 2014.

A. L. Bazzan, Opportunities for multiagent systems and multiagent reinforcement learning in traffic control, Autonomous Agents and Multi-Agent Systems, vol.18, issue.3, pp.342-375, 2009.

S. El-tantawy and B. Abdulhai, Multi-agent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC), Proceedings of IEEE Conference on Intelligent Transportation Systems, pp.319-326, 2012.

P. Koonce, L. Rodegerdts, K. Lee, S. Quayle, S. Beaird et al., Traffic signal timing manual, Tech. Rep, 2008.

, Highway capacity manual, Transportation Research Board, 2000.

D. I. Robertson, TRANSYT method for area traffic control, Traffic Engineering & Control, vol.11, issue.6, 1969.

R. Vincent and C. Young, Self-optimising traffic signal control using microprocessors. the trrl mova strategy for isolated intersections, Traffic engineering & control, vol.27, issue.7-8, pp.385-387, 1986.

P. Lowrie, The Sydney coordinated adaptive traffic system-principles, methodology, algorithms, International Conference on Road Traffic Signalling, 1982.

P. Hunt, D. Robertson, R. Bretherton, and M. Royle, The SCOOT on-line traffic signal optimisation technique, Traffic Engineering & Control, vol.23, issue.4, 1982.

N. H. Gartner, OPAC: A demand-responsive strategy for traffic signal control, Transportation Research Record, issue.906, pp.75-81, 1983.

V. Mauro and C. D. Taranto, UTOPIA, Proceedings of IFAC/IFORS Conference on Control, Computers and Communications in Transport, 1989.

D. Robertson and R. Bretherton, Optimum control of an intersection for any known sequence of vehicle arrivals, Proceedings of the 2nd IFAC/IFIP/IFORS Symposium on Traffic Control and Transportation Systems, 1974.

J. Henry, J. Farges, and J. Tuffal, The PRODYN real time traffic algorithm, IFACIFIP-IFORS Conference on control in transportation systems, 1984.

K. L. Head, P. B. Mirchandani, and D. Sheppard, Hierarchical framework for realtime traffic control, Transportation Research Record, issue.1360, pp.82-88, 1992.

P. Mirchandani and L. Head, A real-time traffic signal control system: architecture, algorithms, and analysis, Transportation Research Part C: Emerging Technologies, vol.9, issue.6, p.153, 2001.

C. Cai, C. K. Wong, and B. G. Heydecker, Adaptive traffic signal control using approximate dynamic programming, Transportation Research Part C: Emerging Technologies, vol.17, issue.5, pp.456-474, 2009.

J. Wu, A. Abbas-turki, and A. E. Moudni, Cooperative driving: an ant colony system for autonomous intersection management, Applied Intelligence, vol.37, issue.2, pp.207-222, 2012.

V. Milanés, J. Villagrá, J. Godoy, J. Simó, J. Pérez et al., An intelligent v2i-based traffic management system, IEEE Transactions on Intelligent Transportation Systems, vol.13, issue.1, pp.49-58, 2012.

J. Lee and B. Park, Development and evaluation of a cooperative vehicle intersection control algorithm under the connected vehicles environment, IEEE Transactions on Intelligent Transportation Systems, vol.13, issue.1, pp.81-90, 2012.

K. Dresner and P. Stone, Multiagent traffic management: A reservation-based intersection control mechanism, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, vol.2, pp.530-537, 2004.

L. Li, D. Wen, and D. Yao, A survey of traffic control with vehicular communications, IEEE Transactions on Intelligent Transportation Systems, vol.15, issue.1, pp.425-432, 2014.

B. Van-arem, C. J. Van-driel, and R. Visser, The impact of cooperative adaptive cruise control on traffic-flow characteristics, IEEE Transactions on Intelligent Transportation Systems, vol.7, issue.4, pp.429-436, 2006.

H. K. Lo, A novel traffic signal control formulation, Transportation Research Part A: Policy and Practice, vol.33, issue.6, pp.433-448, 1999.

H. K. Lo, E. Chang, and Y. C. Chan, Dynamic network traffic control, Transportation Research Part A: Policy and Practice, vol.35, issue.8, pp.721-744, 2001.

K. Aboudolas, M. Papageorgiou, A. Kouvelas, and E. Kosmatopoulos, A rollinghorizon quadratic-programming approach to the signal control problem in largeBIBLIOGRAPHY scale congested urban road networks, Transportation Research Part C: Emerging Technologies, vol.18, issue.5, pp.680-694, 2010.

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, 2014.

X. Yu and W. W. Recker, Stochastic adaptive control model for traffic signal systems, Transportation Research Part C: Emerging Technologies, vol.14, issue.4, pp.263-282, 2006.
DOI : 10.1016/j.trc.2006.08.002
URL : https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1103&context=eeng_fac

R. Haijema and J. Van, An mdp decomposition approach for traffic control at isolated signalized intersections, Probability in the Engineering and Informational Sciences, vol.22, pp.587-602, 2008.
DOI : 10.1017/s026996480800034x

M. Dell'orco, A dynamic network loading model for mesosimulation in transportation systems, European Journal of Operational Research, vol.175, issue.3, pp.1447-1454, 2006.

H. B. Celikoglu and M. Dell'orco, Mesoscopic simulation of a dynamic link loading process, Transportation Research Part C: Emerging Technologies, vol.15, issue.5, pp.329-344, 2007.

M. J. Lighthill and G. B. Whitham, On kinematic waves I: flood movement in long rivers II: a theory of traffic flow on long crowded roads, the Royal Society of London A, vol.229, pp.281-345, 1955.

P. I. Richards, Shock waves on the highway, Operations research, vol.4, issue.1, pp.42-51, 1956.

C. F. Daganzo, The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory, Transportation Research Part B: Methodological, vol.28, issue.4, pp.269-287, 1994.

A. Sumalee, R. Zhong, T. Pan, and W. Szeto, Stochastic cell transmission model (SCTM): A stochastic dynamic traffic model for traffic state surveillance and assignment, Transportation Research Part B: Methodological, vol.45, issue.3, p.155, 2011.

H. Abouaissa, M. Fliess, and C. Join, Fast parametric estimation for macroscopic traffic flow model, 17th IFAC World Congress, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00259032

, PARAMICS

, MITSIM

, VISSIM

K. Nagel and M. Schreckenberg, A cellular automaton model for freeway traffic, Journal de physique I, vol.2, issue.12, pp.2221-2229, 1992.
URL : https://hal.archives-ouvertes.fr/jpa-00246697

O. K. Tonguz, W. Viriyasitavat, and F. Bai, Modeling urban traffic: a cellular automata approach, IEEE Communications Magazine, vol.47, issue.5, pp.142-150, 2009.

J. Esser and M. Schreckenberg, Microscopic simulation of urban traffic based on cellular automata, International Journal of Modern Physics C, vol.8, issue.05, pp.1025-1036, 1997.

S. Maerivoet and B. De-moor, Cellular automata models of road traffic, Physics Reports, vol.419, issue.1, pp.1-64, 2005.

M. Florian, M. Mahut, and N. Tremblay, Application of a simulation-based dynamic traffic assignment model, European Journal of Operational Research, vol.189, issue.3, pp.1381-1392, 2008.

M. Wiering, Multi-agent reinforcement learning for traffic light control, International Conference on Machine Learning (ICML), pp.1151-1158, 2000.

M. Wiering, J. Vreeken, J. Van-veenen, and A. Koopman, Simulation and optimization of traffic in a city, IEEE Intelligent Vehicles Symposium, pp.453-458, 2004.

R. E. Bellman, Dynamic Programming, 1957.

R. J. Dakin, A tree-search algorithm for mixed integer programming problems, The Computer Journal, vol.8, issue.3, pp.250-255, 1965.

T. H. Heung, T. K. Ho, and Y. F. Fung, Coordinated road-junction traffic control by dynamic programming, IEEE Transactions on Intelligent Transportation Systems, vol.6, issue.3, pp.341-350, 2005.

J. Wu, A. Abbas-turki, and A. E. Moudni, Discrete methods for urban intersection traffic controlling, IEEE 69th Vehicular Technology Conference

, IEEE, pp.1-5, 2009.

D. Teodorovi´cteodorovi´c, V. Varadarajan, J. Popovi´cpopovi´c, M. R. Chinnaswamy, and S. Ramaraj, Dynamic programming-neural network real-time traffic adaptive signal control algorithm, Annals of Operations Research, vol.143, issue.1, pp.123-131, 2006.

F. Yan, M. Dridi, and A. E. Moudni, A scheduling approach for autonomous vehicle sequencing problem at multi-intersections, International Journal of Operational Research, vol.9, issue.1, pp.57-68, 2011.

F. Yan, M. Dridi, and A. El-moudni, New vehicle sequencing algorithms with vehicular infrastructure integration for an isolated intersection, Telecommunication Systems, vol.50, issue.4, pp.325-337, 2012.

B. Yin, M. Dridi, and A. E. Moudni, Forward search algorithm based on dynamic programming for real-time adaptive traffic signal control, IET Intelligent Transportation Systems, vol.9, issue.7, pp.754-764, 2015.

Y. Zhang and Y. Xie, Artificial Intelligence Applications to Critical Transportation Issues, p.11, 2012.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 1998.

B. Park, C. Messer, and T. Urbanik, Traffic signal optimization program for oversaturated conditions: genetic algorithm approach, Transportation Research Record: Journal of the Transportation Research Board, issue.1683, p.157, 1999.

B. Park, C. Messer, T. Urbanik, and . Ii, Enhanced genetic algorithm for signaltiming optimization of oversaturated intersections, Transportation Research Record: Journal of the Transportation Research Board, issue.1727, pp.32-41, 2000.

H. Ceylan and M. G. Bell, Traffic signal timing optimisation based on genetic algorithm approach, including drivers' routing, Transportation Research Part B: Methodological, vol.38, issue.4, pp.329-342, 2004.

K. B. Kesur, Advances in genetic algorithm optimization of traffic signals, Journal of Transportation Engineering, 2009.

J. Lee, B. Abdulhai, A. Shalaby, and E. Chung, Real-time optimization for adaptive traffic signal control using genetic algorithms, Journal of Intelligent Transportation Systems, vol.9, issue.3, pp.111-122, 2005.

J. Kennedy and R. Eberhart, Particle swarm optimization, IEEE International Conference on Neural Networks, vol.4, pp.1942-1948, 1995.

D. Teodorovi´cteodorovi´c, Swarm intelligence systems for transportation engineering: Principles and applications, Transportation Research Part C: Emerging Technologies, vol.16, issue.6, pp.651-667, 2008.

Y. Wei, Q. Shao, Y. Han, and B. Fan, Intersection signal control approach based on pso and simulation, Second InternationalConference on Genetic and Evolutionary Computing, pp.277-280, 2008.

J. García-nieto, E. Alba, and A. C. Olivera, Swarm intelligence for traffic light scheduling: Application to real urban areas, Engineering Applications of Artificial Intelligence, vol.25, issue.2, pp.274-283, 2012.

J. Garcia-nieto, A. C. Olivera, and E. Alba, Optimal cycle program of traffic lights with particle swarm optimization, IEEE Transactions on Evolutionary Computation, vol.17, issue.6, pp.823-839, 2013.

R. Putha, L. Quadrifoglio, and E. Zechman, Comparing ant colony optimization and genetic algorithm approaches for solving traffic signal coordination under oversaturation conditions, Computer-Aided Civil and Infrastructure Engineering, vol.27, issue.1, pp.14-28, 2012.

D. Mckenney and T. White, Distributed and adaptive traffic signal control within a realistic traffic simulation, Engineering Applications of Artificial Intelligence, vol.26, issue.1, pp.574-583, 2013.

S. Chiu and S. Chand, Adaptive traffic signal control using fuzzy logic, Second IEEE International Conference on Fuzzy Systems, pp.1371-1376, 1993.

J. Niittymäki and M. Pursula, Signal control using fuzzy logic, vol.116, pp.11-22, 2000.

M. B. Trabia, M. S. Kaseko, and M. Ande, A two-stage fuzzy logic controller for traffic signals, Transportation Research Part C: Emerging Technologies, vol.7, issue.6, pp.353-367, 1999.

L. Zhang, H. Li, and P. D. Prevedouros, Signal control for oversaturated intersections using fuzzy logic, Transportation Research Board Annual Meeting, 2005.

Y. S. Murat and E. Gedizlioglu, A fuzzy logic multi-phased signal control model for isolated junctions, Transportation Research Part C: Emerging Technologies, vol.13, issue.1, pp.19-36, 2005.

M. C. Choy, D. Srinivasan, and R. L. Cheu, Cooperative, hybrid agent architecture for real-time traffic signal control, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, vol.33, issue.5, pp.597-607, 2003.

J. Lee and H. Lee-kwang, Distributed and cooperative fuzzy controllers for traffic intersections group, Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol.29, pp.263-271, 1999.

B. P. Gokulan and D. Srinivasan, Distributed geometric fuzzy multiagent urban traffic signal control, IEEE Transactions on Intelligent Transportation Systems, vol.11, issue.3, pp.714-727, 2010.

D. Zhao, Y. Dai, and Z. Zhang, Computational intelligence in urban traffic signal control: A survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol.42, issue.4, p.159, 2012.

M. C. Choy, D. Srinivasan, and R. L. Cheu, Neural networks for continuous online learning and control, IEEE Transactions on Neural Networks, vol.17, issue.6, pp.1511-1531, 2006.

G. Shen and X. Kong, Study on road network traffic coordination control technique with bus priority, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol.39, issue.3, pp.343-351, 2009.

D. Srinivasan, M. C. Choy, and R. L. Cheu, Neural networks for real-time traffic signal control, IEEE Transactions on Intelligent Transportation Systems, vol.7, issue.3, pp.261-272, 2006.

D. Srinivasan and M. Choy, Cooperative multi-agent system for coordinated traffic signal control, IEE Proceedings-Intelligent Transport Systems, vol.153, pp.41-50, 2006.

E. Bingham, Reinforcement learning in neurofuzzy traffic signal control, European Journal of Operational Research, vol.131, issue.2, pp.232-241, 2001.

L. Busoniu, R. Babuska, B. D. Schutter, and D. Ernst, Reinforcement learning and dynamic programming using function approximators, 2010.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming: an overview, Proceedings of IEEE 34th International Conference on Decision and Control (CDC), vol.1, pp.560-564, 1995.

W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality, 2007.

L. Busoniu, R. Babuska, and B. D. Schutter, A comprehensive survey of multiagent reinforcement learning, IEEE Transactions on Systems, Man, and Cybernetics, vol.38, issue.2, pp.156-172, 2008.

P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, Handbook of intelligent control: Neural, fuzzy, and adaptive approaches, vol.15, pp.493-525, 1992.

F. Wang, H. Zhang, and D. Liu, Adaptive dynamic programming: an introduction, Computational Intelligence Magazine, IEEE, vol.4, issue.2, pp.39-47, 2009.
DOI : 10.1109/mci.2009.932261

J. Si and Y. Wang, Online learning control by association and reinforcement, IEEE Transactions on Neural Networks, vol.12, issue.2, pp.264-276, 2001.

X. Xu, L. Zuo, and Z. H. Huang, Reinforcement learning algorithms with function approximation: Recent advances and applications, Information Sciences, vol.261, pp.1-31, 2014.

I. Arel, C. Liu, T. Urbanik, and A. G. Kohls, Reinforcement learning-based multiagent system for network traffic signal control, IET Intelligent Transportation Systems, vol.4, issue.2, pp.128-135, 2010.

S. Box and B. Waterson, An automated signalized junction controller that learns strategies by temporal difference reinforcement learning, Engineering Applications of Artificial Intelligence, vol.26, issue.1, pp.652-659, 2013.
DOI : 10.1016/j.engappai.2012.02.013
URL : https://eprints.soton.ac.uk/336298/1/tdpaper2012%2520%25281%2529.pdf

L. Prashanth and S. Bhatnagar, Reinforcement learning with function approximation for traffic signal control, IEEE Transactions on Intelligent Transportation Systems, vol.12, issue.2, pp.412-421, 2011.

T. Li, D. B. Zhao, and J. Q. Yi, Adaptive dynamic programming for multiintersections traffic signal intelligent control, Proceedings of IEEE Conference on Intelligent Transportation Systems, pp.286-291, 2008.
DOI : 10.1109/itsc.2008.4732718

L. Baird and A. W. Moore, Gradient descent for general reinforcement learning, Advances in neural information processing systems, pp.968-974, 1999.

J. N. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997.
DOI : 10.1109/9.580874
URL : http://web.mit.edu/jnt/www/Papers/J063-97-bvr-td.pdf

S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol.22, pp.33-57, 1996.
DOI : 10.1007/978-0-585-33656-5_4
URL : http://www-anw.cs.umass.edu/pubs/1995_96/bradtke_b_ML96.pdf

J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, vol.49, pp.233-246, 2002.

D. Sen, Kernel-based reinforcement learning, Machine learning, vol.49, issue.2-3, p.161, 2002.

S. El-tantawy, B. Abdulhai, and H. Abdelgawad, Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown Toronto, IEEE Transactions on Intelligent Transportation Systems, vol.14, issue.3, pp.1140-1150, 2013.

, Design of reinforcement learning parameters for seamless application of adaptive traffic signal control, Journal of Intelligent Transportation Systems, vol.18, issue.3, pp.227-245, 2014.

M. Wiering and M. Van-otterlo, Reinforcement Learning: State-of-the-art

, Springer Science & Business Media, vol.12, 2012.

S. Richter, D. Aberdeen, and J. Yu, Natural actor-critic for road traffic optimisation, Advances in neural information processing systems, pp.1169-1176, 2006.

A. A. Sherstov and P. Stone, Function approximation via tile coding: Automating parameter choice, Abstraction, Reformulation and Approximation, pp.194-205, 2005.
DOI : 10.1007/11527862_14
URL : http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/SARA05.ps

T. T. Pham, T. Brys, M. E. Taylor, T. Brys, M. M. Drugan et al.,

C. Cock, L. Lazar, D. Demarchi, and . Steenhoff, Learning coordinated traffic light control, Proceedings of the Adaptive and Learning Agents workshop, vol.10, pp.1196-1201, 2013.

M. Abdoos, N. Mozayani, and A. L. Bazzan, Hierarchical control of traffic signals using q-learning with tile coding, Applied intelligence, vol.40, issue.2, pp.201-213, 2014.

A. L. Bazzan, A distributed approach for coordination of traffic signal agents, Autonomous Agents and Multi-Agent Systems, vol.10, issue.1, pp.131-164, 2005.

P. Balaji, X. German, and D. Srinivasan, Urban traffic signal control using reinforcement learning agents, IET Intelligent Transportation Systems, vol.4, issue.3, pp.177-188, 2010.

A. Salkham, R. Cunningham, A. Garg, and V. Cahill, A collaborative reinforcement learning approach to urban traffic control optimization, Proceedings of 162 BIBLIOGRAPHY the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol.02, pp.560-566, 2008.

M. A. Khamis and W. Gomaa, Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework, Engineering Applications of Artificial Intelligence, vol.29, pp.134-151, 2014.

J. France and A. A. Ghorbani, A multiagent system for optimizing urban traffic, International IEEE/WIC Conference on Intelligent Agent Technology, pp.411-414, 2003.

Z. S. Yang, X. Chen, Y. S. Tang, and J. P. Sun, Intelligent cooperation control of urban traffic networks, International Conference on Machine Learning and Cybernetics, vol.3, pp.1482-1486, 2005.

A. L. Bazzan, D. De-oliveira, and B. C. Silva, Learning in groups of traffic signals, Engineering Applications of Artificial Intelligence, vol.23, issue.4, pp.560-568, 2010.

L. Kuyer, S. Whiteson, B. Bakker, and N. Vlassis, Multiagent reinforcement learning for urban traffic control using coordination graphs, Machine Learning and Knowledge Discovery in Databases, pp.656-671, 2008.

J. C. Medina and R. F. Benekohal, Traffic signal control using reinforcement learning and the max-plus algorithm as a coordinating strategy, Proceedings of IEEE Conference on Intelligent Transportation Systems, pp.596-601, 2012.

J. R. Kok and N. Vlassis, Collaborative multiagent reinforcement learning by payoff propagation, Journal of Machine Learning Research, vol.7, pp.1789-1828, 2006.

P. J. Werbos, Reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions, and the challenges ahead, p.163, 2012.

P. Mannion, J. Duggan, and E. Howley, An experimental review of reinforcement learning algorithms for adaptive traffic signal control, 2015.

, NEMAStandardsPublicationTS2-Traffic Controller Assemblies with NTCIP Requirements, National Electrical Manufacturers Association Std, 2003.

P. E. Hart, N. J. Nilsson, and B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Transactions on Systems Science and Cybernetics, vol.4, issue.2, pp.100-107, 1968.

A. G. Barto, S. J. Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence, vol.72, issue.1, pp.81-138, 1995.

B. Yin, M. Dridi, and A. E. Moudni, Traffic control model and algorithm based on decomposition of MDP, IEEE International Conference on Control, Decision and Information Technologies, pp.225-230, 2014.

R. Dechter and J. Pearl, Generalized best-first search strategies and the optimality of A*, Journal of the ACM (JACM), vol.32, issue.3, pp.505-536, 1985.

E. A. Hansen and S. Zilberstein, LAO*: a heuristic search algorithm that finds solutions with loops, Artificial Intelligence, vol.129, issue.1, pp.35-62, 2001.

N. J. Nilsson, Artificial intelligence: a new synthesis, 1998.

P. J. Werbos and X. Pang, Generalized maze navigation: SRN critics solve what feedforward or hebbian nets cannot, IEEE International Conference on Systems, Man, and Cybernetics, vol.3, pp.1764-1769, 1996.

C. Cai, Adaptive traffic signal control using approximate dynamic programming, 2009.

D. V. Prokhorov and D. C. Wunsch, Adaptive critic designs, IEEE Transactions on Neural Networks, vol.8, issue.5, pp.997-1007, 1997.

D. Zhao, Z. Hu, Z. Xia, C. Alippi, Y. Zhu et al., Full-range adaptive cruise control based on supervised adaptive dynamic programming, Neurocomputing, vol.125, pp.57-67, 2014.

X. Xu, H. -g.-he, and D. Hu, Efficient reinforcement learning using recursive leastsquares methods, Journal of Artificial Intelligence Research, vol.16, issue.1, pp.259-292, 2002.

T. Söderström and P. Stoica, Instrumental variable methods for system identification, Circuits, Systems and Signal Processing, vol.21, issue.1, pp.1-9, 2002.

N. H. Gartner, P. J. Tarnoff, and C. M. Andrews, Evaluation of optimized policies for adaptive control strategy, Transport Research Record, issue.1324, pp.105-114, 1991.

F. V. Webster, Traffic signal settings, Road Research Laboratory, issue.39, 1958.

S. B. Cools, C. Gershenson, and B. D'hooghe, Self-organizing traffic lights: A realistic simulation, Advances in Applied Self-organizing Systems, pp.41-50, 2008.

L. Ljung and T. Söderström, Document r ´ ealiséealis´ealisé avec L AT E X et : le style L AT E X pour ThèseTh`Thèse de Doctorat créécr´crécré´créé par, 1983.