Skip to Main content Skip to Navigation
Theses

Reinforcement Learning: The Multi-Player Case

Julien Pérolat 1, 2
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : This thesis mainly focuses on learning from historical data in a sequential multi-agent environment. We studied the problem of batch learning in Markov games (MGs). Markov games are a generalization of Markov decision processes (MDPs) to the multi-agent setting. Our approach was to propose learning algorithms to find equilibria in games where the knowledge of the game is limited to interaction samples (also named batch data). To achieve this task, we explore two main approaches. The first approach explored in this thesis is to study approximate dynamic programming techniques. We generalize several batch algorithms from MDPs to zero-sum two-player MGs. This part of our work generalizes several approximate dynamic programming bounds from a L ∞ -norm to a L p -norm. Then we describe, test and compare algorithms based on those dynamic programming schemes. But these algorithms are highly sensitive to the discount factor (a parameter that controls the time horizon of the problem). To improve those algorithms, we studied many non-stationary variants of approximate dynamic methods to the zero sum two player case. In the end, we show that using non-stationary strategies can be used in general sum games. However, the resulting guarantees are very loose compared to the one on MDPs or zero-sum two-player MGs. The second approach studied in this manuscript is the Bellman residual approach. This approach reduces the problem of learning from batch data to the minimization of a loss function. In a zero-sum two-player MG, we prove that using a Newton’s method on some Bellman residuals is either equivalent to the Least Squares Policy Iteration (LSPI) algorithm or to the Bellman Residual Minimizing Policy Iteration (BRMPI) algorithm. We leverage this link to address the oscillation of LSPI in MDPs and in MGs. Then we show that a Bellman residual approach could be used to learn from batch data in general-sum MGs. Finally in the last part of this dissertation, we study multi-agent independent learning in Multi-Stage Games (MSGs). We provide an actor-critic independent learning algorithm that provably converges in zero-sum two-player MSGs and in cooperative MSGs and empirically converges using function approximation on the game of Alesia.
Complete list of metadatas

Cited literature [96 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01820700
Contributor : Preux Philippe <>
Submitted on : Friday, July 6, 2018 - 4:37:58 PM
Last modification on : Friday, May 17, 2019 - 11:39:08 AM
Document(s) archivé(s) le : Monday, October 1, 2018 - 1:24:09 AM

File

phd_perolat.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01820700, version 1

Citation

Julien Pérolat. Reinforcement Learning: The Multi-Player Case. Artificial Intelligence [cs.AI]. Université de Lille 1 - Sciences et Technologies, 2017. English. ⟨tel-01820700⟩

Share

Metrics

Record views

339

Files downloads

1188