Reinforcement Learning: The Multi-Player Case

Julien Pérolat

Thèse Année : 2017

Reinforcement Learning: The Multi-Player Case

Apprentissage par Renforcement: Le Cas Multijoueur

(1, 2)

1
2

Julien Pérolat

Fonction : Auteur
PersonId : 14403
IdHAL : julien-perolat
IdRef : 137378181

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Sequential Learning

Résumé

This thesis mainly focuses on learning from historical data in a sequential multi-agent environment. We studied the problem of batch learning in Markov games (MGs). Markov games are a generalization of Markov decision processes (MDPs) to the multi-agent setting. Our approach was to propose learning algorithms to find equilibria in games where the knowledge of the game is limited to interaction samples (also named batch data). To achieve this task, we explore two main approaches. The first approach explored in this thesis is to study approximate dynamic programming techniques. We generalize several batch algorithms from MDPs to zero-sum two-player MGs. This part of our work generalizes several approximate dynamic programming bounds from a L ∞ -norm to a L p -norm. Then we describe, test and compare algorithms based on those dynamic programming schemes. But these algorithms are highly sensitive to the discount factor (a parameter that controls the time horizon of the problem). To improve those algorithms, we studied many non-stationary variants of approximate dynamic methods to the zero sum two player case. In the end, we show that using non-stationary strategies can be used in general sum games. However, the resulting guarantees are very loose compared to the one on MDPs or zero-sum two-player MGs. The second approach studied in this manuscript is the Bellman residual approach. This approach reduces the problem of learning from batch data to the minimization of a loss function. In a zero-sum two-player MG, we prove that using a Newton’s method on some Bellman residuals is either equivalent to the Least Squares Policy Iteration (LSPI) algorithm or to the Bellman Residual Minimizing Policy Iteration (BRMPI) algorithm. We leverage this link to address the oscillation of LSPI in MDPs and in MGs. Then we show that a Bellman residual approach could be used to learn from batch data in general-sum MGs. Finally in the last part of this dissertation, we study multi-agent independent learning in Multi-Stage Games (MSGs). We provide an actor-critic independent learning algorithm that provably converges in zero-sum two-player MSGs and in cooperative MSGs and empirically converges using function approximation on the game of Alesia.

Ce manuscrit de thèse présente des travaux d'apprentissage par renforcement dans le cadre des jeux stochastiques. Les deux première parties de ce manuscrit sont dédiées à l'apprentissage à partir de données dites batch. Une première approche par programmation dynamique approchée est proposée dans le cadre des jeux à deux joueurs à somme nulle et ses limitations sont discutées dans le cadre des jeux à somme générale. Puis, nous étudions une seconde approche par minimisation du résidu de Bellman dans le cadre des jeux à deux joueurs à somme nulle et l'étendons aux jeux à somme générale. Finalement, on s'intéresse à l'apprentissage en ligne et introduisons un algorithme acteur-critique qui converge pour des jeux à somme nulle et des jeux coopératifs à étages.

Mots clés

machine learning algorithm artificial intelligence reinforcement learn- ing game theory multi-agent learning.

apprentissage par renforcement théorie des jeux apprentissage automatique algorithmes intelligence artificielle apprentissage multi-agent.

Domaines

Intelligence artificielle [cs.AI] Informatique et théorie des jeux [cs.GT] Apprentissage [cs.LG] Système multi-agents [cs.MA] Machine Learning [stat.ML]

Fichier principal

phd_perolat.pdf (3.08 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Preux Philippe : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-01820700

Soumis le : vendredi 6 juillet 2018-16:37:58

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : lundi 1 octobre 2018-01:24:09

Dates et versions

tel-01820700 , version 1 (06-07-2018)

Identifiants

HAL Id : tel-01820700 , version 1

Citer

Julien Pérolat. Reinforcement Learning: The Multi-Player Case. Artificial Intelligence [cs.AI]. Université de Lille 1 - Sciences et Technologies, 2017. English. ⟨NNT : ⟩. ⟨tel-01820700⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

519 Consultations

1512 Téléchargements

Reinforcement Learning: The Multi-Player Case

Apprentissage par Renforcement: Le Cas Multijoueur

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager