Skip to Main content Skip to Navigation
Theses

Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety

Yannis Flet-Berliac 1, 2
2 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : One major challenge of reinforcement learning is to efficiently explore an environment in order to learn optimal policies through trial and error. To achieve this, the agent must be able to learn effectively from past experiences, enabling it to form an accurate picture of the benefit of certain actions over others. Beyond that, an obvious but central issue is that what is not known must be explored, and the necessity to explore in a safe way adds another layer of difficulty to the problem. These are the main issues that we address in this Ph.D. thesis. By deconstructing the actor-critic framework and developing alternative formulations of the underlying optimization problem using the notion of variance, we explore how deep reinforcement learning algorithms can more effectively solve continuous control problems, hard exploration environments and risk-sensitive tasks. The first part of the thesis focuses on the critic component of the actor-critic framework, also referred to as value function, and how to learn more efficiently to control agents in continuous control domains through distinct uses of the variance in the value function estimates. The second part of the thesis is concerned with the actor component of the actor-critic framework, also referred to as policy. We propose the introduction of a third element to the optimization problem that agents solve by introducing an adversary. The adversary is of the same nature as the RL agent but trained to suggest actions that mimic the actor or counteract the constraints of our problem. It is represented by some averaged policy distribution with which the actor must differentiate his behavior by maximizing its divergence with it, eventually encouraging the actor to explore more thoroughly in tasks where efficient exploration is a bottleneck, or to act more safely.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03431652
Contributor : Yannis Flet-Berliac Connect in order to contact the contributor
Submitted on : Tuesday, November 16, 2021 - 6:24:54 PM
Last modification on : Thursday, November 18, 2021 - 3:56:49 AM

File

phd_thesis_yfb.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03431652, version 1

Collections

Citation

Yannis Flet-Berliac. Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety. Computer Science [cs]. Université de Lille - Faculté des Sciences et Technologies, 2021. English. ⟨tel-03431652⟩

Share

Metrics

Record views

33

Files downloads

81