Skip to Main content Skip to Navigation

Sequential Resource Allocation in Linear Stochastic Bandits

Marta Soare 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : This thesis is dedicated to the study of resource allocation problems in uncertain environments, where an agent can sequentially select which action to take. After each step, the environment returns a noisy observation of the value of the selected action. These observations guide the agent in adapting his resource allocation strategy towards reaching a given objective. In the most typical setting of this kind, the stochastic multi-armed bandit (MAB), it is assumed that each observation is drawn from an unknown probability distribution associated with the selected action and gives no information on the expected value of the other actions. The MAB setting has been widely studied and optimal allocation strategies were proposed to solve various objectives under the MAB assumptions. Here, we consider a variant of the MAB setting where there exists a global linear structure in the environment and by selecting an action, the agent also gathers information on the value of the other actions. Therefore, the agent needs to adapt his resource allocation strategy to exploit the structure in the environment. In particular, we study the design of sequences of actions that the agent should take to reach objectives such as: (i) identifying the best value with a fixed confidence and using a minimum number of pulls, or (ii) minimizing the prediction error on the value of each action. In addition, we investigate how the knowledge gathered by a bandit algorithm in a given environment can be transferred to improve the performance in other similar environments.
Document type :
Complete list of metadatas

Cited literature [75 references]  Display  Hide  Download
Contributor : Marta Soare <>
Submitted on : Wednesday, December 30, 2015 - 5:21:14 PM
Last modification on : Tuesday, November 24, 2020 - 2:18:21 PM
Long-term archiving on: : Tuesday, April 5, 2016 - 1:48:22 PM


  • HAL Id : tel-01249224, version 1


Marta Soare. Sequential Resource Allocation in Linear Stochastic Bandits . Machine Learning [cs.LG]. Université Lille 1 - Sciences et Technologies, 2015. English. ⟨tel-01249224⟩



Record views


Files downloads