Skip to Main content Skip to Navigation

Reinforcement Learning: The Multi-Player Case

Julien Pérolat 1, 2 
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : This thesis mainly focuses on learning from historical data in a sequential multi-agent environment. We studied the problem of batch learning in Markov games (MGs). Markov games are a generalization of Markov decision processes (MDPs) to the multi-agent setting. Our approach was to propose learning algorithms to find equilibria in games where the knowledge of the game is limited to interaction samples (also named batch data). To achieve this task, we explore two main approaches. The first approach explored in this thesis is to study approximate dynamic programming techniques. We generalize several batch algorithms from MDPs to zero-sum two-player MGs. This part of our work generalizes several approximate dynamic programming bounds from a L ∞ -norm to a L p -norm. Then we describe, test and compare algorithms based on those dynamic programming schemes. But these algorithms are highly sensitive to the discount factor (a parameter that controls the time horizon of the problem). To improve those algorithms, we studied many non-stationary variants of approximate dynamic methods to the zero sum two player case. In the end, we show that using non-stationary strategies can be used in general sum games. However, the resulting guarantees are very loose compared to the one on MDPs or zero-sum two-player MGs. The second approach studied in this manuscript is the Bellman residual approach. This approach reduces the problem of learning from batch data to the minimization of a loss function. In a zero-sum two-player MG, we prove that using a Newton’s method on some Bellman residuals is either equivalent to the Least Squares Policy Iteration (LSPI) algorithm or to the Bellman Residual Minimizing Policy Iteration (BRMPI) algorithm. We leverage this link to address the oscillation of LSPI in MDPs and in MGs. Then we show that a Bellman residual approach could be used to learn from batch data in general-sum MGs. Finally in the last part of this dissertation, we study multi-agent independent learning in Multi-Stage Games (MSGs). We provide an actor-critic independent learning algorithm that provably converges in zero-sum two-player MSGs and in cooperative MSGs and empirically converges using function approximation on the game of Alesia.
Complete list of metadata

Cited literature [96 references]  Display  Hide  Download
Contributor : Preux Philippe Connect in order to contact the contributor
Submitted on : Friday, July 6, 2018 - 4:37:58 PM
Last modification on : Wednesday, March 23, 2022 - 3:51:19 PM
Long-term archiving on: : Monday, October 1, 2018 - 1:24:09 AM


Files produced by the author(s)


  • HAL Id : tel-01820700, version 1


Julien Pérolat. Reinforcement Learning: The Multi-Player Case. Artificial Intelligence [cs.AI]. Université de Lille 1 - Sciences et Technologies, 2017. English. ⟨tel-01820700⟩



Record views


Files downloads