Large-scale automatic learning of autonomous agent behavior with structured deep reinforcement learning

Edward Beeching

Résumé

Autonomous robotic agents have begun to impact many aspects of our society, with application in automated logistics, autonomous hospital porters, manufacturing and household assistants. The objective of this thesis is to explore Deep Reinforcement Learning approaches to planning and navigation in large and unknown 3D environments. In particular, we focus on tasks that require exploration and memory in simulated environments. An additional requirement is that learned policies should generalize to unseen map instances. Our long-term objective is the transfer of a learned autonomous robotic agents have begun to impact many aspects of our society, with application in automated logistics, autonomous hospital porters, manufacturing and household assistants. The objective of this thesis is to explore Deep Reinforcement Learning approaches to planning and navigation in large and unknown 3D environments. In particular, we focus on tasks that require exploration and memory in simulated environments. Our long-term objective is the transfer of a learned objective of accumulating a task-based reward, an Embodied AI agent must learn to discover relevant semantic cues such as object recognition and obstacle avoidance, if these skills are pertinent to the task at hand. This thesis introduces the field of Structured Deep Reinforcement Learning and then describes 5 contributions that were published during the PhD. We start by creating a set of challenging memory-based tasks whose performance is benchmarked with an unstructured memory-based agent. We then demonstrate how the incorporation of structure in the form of a learned metric map, differentiable inverse projective geometry and self-attention mechanisms; augments the unstructured agent, improving its performance and allowing us to interpret the agent’s reasoning process. We then move from complex tasks in visually simple environments, to more challenging environments with photo-realistic observations, extracted from scans of real-world buildings. In this work we demonstrate that augmenting such an agent with a topological map can improve its navigation performance. We achieve this by learning a neural approximation of a classical path planning algorithm, which can be utilized on graphs with uncertain connectivity. From work undertaken over the course of a 4-month internship at the R & D department of Ubisoft, we demonstrate that structured methods can also be used for navigation and planning in challenging video game environments. Where we couple a lower level neural policy with a classical planning algorithm to improve long-distance planning and navigation performance in vast environments of 1km×1km. Finally, we develop an open-source Deep Reinforcement Learning interface for the Godot Game Engine. Allowing for the construction of complex virtual worlds and the learning of agent behaviors with a suite of state-of-the-art algorithms.

Les robots autonomes ont commencé à impacter de nombreux aspects de notre société avec, par exemple des applications dans la logistique automatisée, les robots hospitaliers autonomes, l’industrie ou encore les aides ménagères. L’objectif de cette thèse est d’explorer les approches d’apprentissage par renforcement profond pour la planification et la navigation dans des environnements 3D vastes et inconnus. Nous nous concentrons en particulier sur les tâches qui nécessitent d’explorer et mémoriser les environnements simulés. Une contrainte supplémentaire est que les stratégies apprises doivent se généraliser à des cartes inconnues. Notre objectif à long terme est le transfert d’une technique d’apprentissage vers un système robotique dans le monde réel. Les algorithmes d’apprentissage par renforcement apprennent des interactions. En agissant avec l’objectif d’accumuler des récompenses liées à une tâche, une IA incarnée doit apprendre à découvrir des informations sémantiques telles que la reconnaissance d’objets et l’évitement d’obstacles, si ces compétences sont pertinentes pour l’accomplissement de la tâche. Cette thèse introduit le domaine de l’Apprentissage par Renforcement Profond Structuré et décrit ensuite cinq contributions qui ont été publiées au cours de la thèse. Nous commençons par créer un ensemble de tâches complexes nécessitant de la mémoire pour comparer les performances avec un agent à la mémoire non structurée. Nous démontrons ensuite comment l’incorporation d’une structure telle qu’une carte métrique apprise, une géométrie projective inverse différentiable et des mécanismes d’autoattention améliorent les performances de l’agent, ce qui nous permet d’analyser son processus de raisonnement. Nous passons ensuite d’environnements visuellement simples à des environnements plus difficiles avec des observations photoréalistes extraites de scans de bâtiments du monde réel. Dans ce travail, nous démontrons qu’améliorer un agent avec une carte topologique peut améliorer ses performances de navigation. Nous y parvenons en lui apprenant une approximation neuronale d’un algorithme de planification de chemin classique, qui peut être utilisé sur des graphes avec une connectivité incertaine. Ensuite, à partir des travaux menés lors d’un stage de quatre mois au sein du département recherche et développement d’Ubisoft, nous démontrons que les méthodes structurées peuvent également être utilisées pour la navigation et la planification dans des environnements de jeux vidéo complexes. Nous combinons une politique neuronale de bas niveau avec un algorithme de planification classique pour améliorer la planification à longue distance et les performances de navigation dans de vastes environnements de 1km×1km.

Large-scale automatic learning of autonomous agent behavior with structured deep reinforcement learning

Apprentissage automatique à grande échelle du comportement des agents autonomes avec apprentissage structuré par renforcement profond

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager