Budgeted Classification-based Policy Iteration

Victor Gabillon 1, 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : This dissertation is motivated by the study of a class of reinforcement learning (RL) algorithms, called classification-based policy iteration (CBPI). Contrary to the standard RL methods, CBPI do not use an explicit representation for value function. Instead, they use rollouts and estimate the action-value function of the current policy at a collection of states. Using a training set built from these rollout estimates, the greedy policy is learned as the output of a classifier. Thus, the policy generated at each iteration of the algorithm, is no longer defined by a (approximated) value function, but instead by a classifier. In this thesis, we propose new algorithms that improve the performance of the existing CBPI methods, especially when they have a fixed budget of interaction with the environment. Our improvements are based on the following two shortcomings of the existing CBPI algorithms: 1) The rollouts that are used to estimate the action-value functions should be truncated and their number is limited, and thus, we have to deal with bias-variance tradeoff in estimating the rollouts, and 2) The rollouts are allocated uniformly over the states in the rollout set and the available actions, while a smarter allocation strategy could guarantee a more accurate training set for the classifier. We propose CBPI algorithms that address these issues, respectively, by: 1) the use of a value function approximation to improve the accuracy (balancing the bias and variance) of the rollout estimates, and 2) adaptively sampling the rollouts over the state-action pairs.
Document type :
Theses
Complete list of metadatas

Cited literature [103 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01297386
Contributor : Preux Philippe <>
Submitted on : Monday, April 4, 2016 - 11:42:33 AM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Tuesday, July 5, 2016 - 12:41:44 PM

Identifiers

  • HAL Id : tel-01297386, version 1

Citation

Victor Gabillon. Budgeted Classification-based Policy Iteration. Machine Learning [stat.ML]. Universite Lille 1, 2014. English. ⟨tel-01297386⟩

Share

Metrics

Record views

321

Files downloads

251