HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety

Yannis Flet-Berliac 1, 2
2 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : One major challenge of reinforcement learning is to efficiently explore an environment in order to learn optimal policies through trial and error. To achieve this, the agent must be able to learn effectively from past experiences, enabling it to form an accurate picture of the benefit of certain actions over others. Beyond that, an obvious but central issue is that what is not known must be explored, and the necessity to explore in a safe way adds another layer of difficulty to the problem. These are the main issues that we address in this Ph.D. thesis. By deconstructing the actor-critic framework and developing alternative formulations of the underlying optimization problem using the notion of variance, we explore how deep reinforcement learning algorithms can more effectively solve continuous control problems, hard exploration environments and risk-sensitive tasks. The first part of the thesis focuses on the critic component of the actor-critic framework, also referred to as value function, and how to learn more efficiently to control agents in continuous control domains through distinct uses of the variance in the value function estimates. The second part of the thesis is concerned with the actor component of the actor-critic framework, also referred to as policy. We propose the introduction of a third element to the optimization problem that agents solve by introducing an adversary. The adversary is of the same nature as the RL agent but trained to suggest actions that mimic the actor or counteract the constraints of our problem. It is represented by some averaged policy distribution with which the actor must differentiate his behavior by maximizing its divergence with it, eventually encouraging the actor to explore more thoroughly in tasks where efficient exploration is a bottleneck, or to act more safely.
Complete list of metadata

Contributor : Yannis Flet-Berliac Connect in order to contact the contributor
Submitted on : Tuesday, November 16, 2021 - 6:24:54 PM
Last modification on : Thursday, March 24, 2022 - 3:43:01 AM
Long-term archiving on: : Thursday, February 17, 2022 - 9:21:23 PM


Files produced by the author(s)


  • HAL Id : tel-03431652, version 1


Yannis Flet-Berliac. Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety. Computer Science [cs]. Université de Lille - Faculté des Sciences et Technologies, 2021. English. ⟨tel-03431652⟩



Record views


Files downloads