Exploration-Exploitation with Thompson Sampling in Linear Systems

Marc Abeille 1, 2
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to address the exploration-exploitation dilemma that is inherent in sequential decision-making under uncertainty. As opposed to algorithms based on the optimism-in-the-face-of-uncertainty (OFU) principle, where the exploration is performed by selecting the most favorable model within the set of plausible one, TS algorithms rely on randomization to enhance the exploration, and thus are much more computationally efficient. We focus on linearly parametrized problems that allow for continuous state-action spaces, namely the Linear Bandit (LB) problems and the Linear Quadratic (LQ) control problems. We derive two novel analyses for the regret of TS algorithms in those settings. While the obtained regret bound for LB is similar to previous results, the proof sheds new light on the functioning of TS, and allows us to extend the analysis to LQ problems. As a result, we prove the first regret bound for TS in LQ, and show that the frequentist regret is of order O(sqrt(T)), we matches the existing guarantee for the regret of OFU algorithms in LQ. Finally, we propose an application of exploration-exploitation techniques to the practical problem of portfolio construction, and discuss the need for active exploration in this setting.
Document type :
Theses
Complete list of metadatas

Cited literature [79 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01816069
Contributor : Marc Abeille <>
Submitted on : Thursday, June 14, 2018 - 5:15:31 PM
Last modification on : Friday, March 22, 2019 - 1:36:35 AM
Long-term archiving on : Monday, September 17, 2018 - 10:48:11 AM

File

These_Marc_Abeille.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01816069, version 1

Citation

Marc Abeille. Exploration-Exploitation with Thompson Sampling in Linear Systems. Mathematics [math]. Université de Lille 1, 2017. English. ⟨tel-01816069⟩

Share

Metrics

Record views

186

Files downloads

974