Skip to Main content Skip to Navigation

APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement.

Odalric-Ambrym Maillard 1, 2 
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : This thesis studies the following topics in Machine Learning: Bandit theory, Statistical learning and Reinforcement learning. The common underlying thread is the non-asymptotic study of various notions of adaptation: to an environment or an opponent in part I about bandit theory, to the structure of a signal in part II about statistical theory, to the structure of states and rewards or to some state-model of the world in part III about reinforcement learning. First we derive a non-asymptotic analysis of a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit that enables to match, in the case of distributions with finite support, the asymptotic distribution-dependent lower bound known for this problem. Now for a multi-armed bandit with a possibly adaptive opponent, we introduce history-based models to catch some weakness of the opponent, and show how one can benefit from such models to design algorithms adaptive to this weakness. Then we contribute to the regression setting and show how the use of random matrices can be beneficial both theoretically and numerically when the considered hypothesis space has a large, possibly infinite, dimension. We also use random matrices in the sparse recovery setting to build sensing operators that allow for recovery when the basis is far from being orthogonal. Finally we combine part I and II to first provide a non-asymptotic analysis of reinforcement learning algorithms such as Bellman-residual minimization and a version of Leastsquares temporal-difference that uses random projections and then, upstream of the Markov Decision Problem setting, discuss the practical problem of choosing a good model of states.
Document type :
Complete list of metadata

Cited literature [267 references]  Display  Hide  Download
Contributor : Philippe Preux Connect in order to contact the contributor
Submitted on : Wednesday, July 17, 2013 - 9:37:33 AM
Last modification on : Saturday, June 25, 2022 - 7:39:28 PM
Long-term archiving on: : Friday, October 18, 2013 - 4:22:21 AM


  • HAL Id : tel-00845410, version 1


Odalric-Ambrym Maillard. APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement.. Machine Learning [cs.LG]. Université des Sciences et Technologie de Lille - Lille I, 2011. English. ⟨tel-00845410⟩



Record views


Files downloads