Autonomous and Online Generation of Skills Inferring Actions Adapted to Low-Level and High-Level Contextual States

Carlos Maestre

Résumé

Robots are expected to assist us in our daily tasks. To that end, they may need to perform different tasks in changing scenarios. The number of dissimilar scenarios a robot can face is unlimited. Therefore, it is plausible to think that a robot must learn autonomously to perform tasks. A task consists in generating an expected change, i.e. an effect, in the environment, the robot configuration, or both. Therefore, the robot must learn to perform the right action on the environment to obtain the expected effect. An approach to learning these actions is through a continuous interaction of the robot with its environment focusing on those actions producing effects on the environment. The acquired relation of applying an action on an object to obtain an effect is called affordance. During the last years many Research efforts were devoted to affordance learning. Related works cover from the learning of simple push actions on tabletop scenarios to the definition of complex cognitive architectures. These works rely on different building blocks, as vision methods to identify the position of the objects or predefined sensorimotor skills to generate effects on a constrained environment. The use of predefined actions eases the learning of affordances, producing a rich and consistent information of the changes produced on an object. However, we claim that the use of these actions constrains the scalability of the available experiments to dynamic and noisy environments. The current work addresses the autonomous learning of a set of sensorimotor skills through interactions with an environment. Each skill must generate a continuous action to reproduce an effect on an object, adapted to the object position. Besides, each skill is simultaneously adapted to low- level perturbations, e.g. a change in the object position, and high-level contextual changes, e.g. a stove gets on. Few questions arise while addressing the skill generation: first, how can a robot explore an environment gathering information with limited a priori information about it? We address this question through a babbling of the environment driven by an intrinsic motivation. We define a method, called Novelty-driven Evolutionary Babbling (NovEB), to explore possible robot’s movements, while focusing on those that generate the highest novelty from the perception point of view. Perception relies on raw images gathered through the robot’s cameras. A simulated PR2 robot, using this method, discovered on its own which regions of the workspace generate novel perceptions and focuses its exploration around them. Second, how can a robot autonomously build a set of skills based on an initial information about the environment? We propose a method, named Adaptive Affordance Learning (A2L), which endows a robot with the capacity to learn affordances associated to an object, both adapting the robot’s skills to the object position, and increasing the robot’s information about the object when needed. Two main contributions are presented: (1) an interaction process with the object adapting each movement to the fixed object position, decomposing each action into a sequence of discrete movements; (2) an iterative process to increase the information about the object. These contributions are assessed in two experiments where a robot learns to push a box to different positions on a table. First, on a virtual setup on a simulated robotic arm. Finally, on a simulated Baxter robot. Finally, we extend the previous skill generation to environments including both low-level and high-level perturbations. Initially, one or more kinaesthetic demon- strations of an action producing an effect on the object are provided to the robot, through a Learning from Demonstration approach. Then, a vector field is computed for each demonstration, generating information about the next movement to execute based on the robot context, composed of the relative positon of the object w.r.t. the robot’s end-effector, and other high-level information. An action genera- tor is learned, inferring in a closed-loop the next movement to reproduce an effect on the object based on the current robot context. In this work, a study is performed in order to select the best parametrization to build a push to the right and a grasp skill to reproduce an effect. Then, the selected parametrization is used to build a set of diverse skills, which are validated in several experiments performing tasks with different objects. The assessment of the built skills is directly performed on a physical Baxter.

Les robots sont censés nous aider dans nos tâches quotidiennes. À cette fin, ils peuvent devoir effectuer différentes tâches dans des scénarios changeants. Le nombre de scénarios dissemblables auxquels un robot peut faire face est illimité. Par conséquent, il est plausible de penser qu’un robot doit apprendre de manière autonome pour effectuer des tâches. Une tâche consiste à générer un changement attendu, c’est-à-dire un effet, dans l’environnement, la configuration du robot, ou les deux. Par conséquent, le robot doit apprendre à effectuer la bonne action sur l’environnement pour obtenir l’effet attendu. Une approche de l’apprentissage de ces actions est à travers une interaction continue du robot avec son environnement en se concentrant sur ces actions produisant des effets sur l’environnement. La relation acquise de l’application d’une action sur un objet pour obtenir un effet est appelée affordance. Au cours des dernières années, de nombreux efforts de recherche ont été consacrés à l’apprentissage des affordances. Les travaux connexes couvrent l’apprentissage de simples actions saissir sur des scénarios de table à la définition d’architectures cognitives complexes. Ces travaux s’appuient sur différents blocs de construction, comme méthodes de vision pour identifier la position des objets ou des compétences sensorimotrices prédéfinies pour générer des effets sur un environnement contraint. L’utilisation d’actions prédéfinies facilite l’apprentissage des affordances, produisant une information riche et cohérente des changements produits sur un objet. Cependant, nous affirmons que l’utilisation de ces actions limite l’évolutivité des expériences disponibles aux environnements dynamiques et bruyants. Le travail actuel porte sur l’apprentissage autonome d’un ensemble de compétences sensori- motrices à travers des interactions avec un environnement. Chaque compétence doit générer une action continue pour reproduire un effet sur un objet, adapté à la position de l’objet. En outre, chaque compétence est simultanément adaptée aux perturbations de bas niveau, par ex. un changement dans la position de l’objet, et des changements contextuels de haut niveau, par ex. un poêle s’allume. Peu de questions se posent en abordant la génération de compétences: d’abord, comment un robot peut-il explorer un environnement rassemblant des informations avec des informations a priori a priori limitées à son sujet? Nous abordons cette question à travers un balbutiement de l’environnement animé par une motivation intrinsèque. Nous définissons une méthode, baptisée Novelty-driven Evolutionary Babbling (NovEB), pour explorer les mouvements possibles du robot, tout en met- tant l’accent sur ceux qui génèrent la plus grande nouveauté du point de vue de la perception. La perception repose sur des images brutes recueillies à travers les caméras du robot. Un robot PR2 simulé, utilisant cette méthode, a découvert à lui seul quelles régions de l’espace de travail génèrent des perceptions nouvelles et concentre son exploration autour d’elles. Deuxièmement, comment un robot peut-il construire de manière autonome un ensemble de compétences sur la base d’une information initiale sur l’environnement? Nous proposons une méthode, nommée Adaptive Affordance Learning (A2L), qui permet à un robot d’apprendre les affordances associées à un objet, en adaptant les compétences du robot à la position de l’objet et en augmentant les informations sur le robot. objet en cas de besoin. Deux contributions principales sont présentées: (1) un processus d’interaction avec l’objet adaptant chaque mouvement à la position de l’objet fixe, décomposant chaque action en une séquence de mouvements discrets; (2) un processus itératif pour augmenter les informations sur l’objet. Ces contributions sont évaluées dans deux expériences où un robot apprend à pousser une boîte à différentes positions sur une table. Tout d’abord, sur une configuration virtuelle sur un bras robotique simulé. Enfin, sur un robot Baxter simulé. Enfin, nous étendons la génération de compétences précédente à des environ- nements comprenant à la fois des perturbations de bas niveau et de haut niveau. Initialement, une ou plusieurs démonstrations kinesthésiques d’une action pro- duisant un effet sur l’objet sont fournies au robot, par le biais d’une approche L’apprentissage par démonstration. Ensuite, un champ de vecteur est calculé pour chaque démonstration, générant des informations sur le mouvement suivant à exé- cuter en fonction du contexte du robot, composé de la position relative de l’objet par rapport l’effecteur du robot, et d’autres informations de haut niveau. Un généra- teur d’action est appris, déduisant en boucle fermée le mouvement suivant pour reproduire un effet sur l’objet en fonction du contexte actuel du robot. Dans ce travail, une étude est effectuée afin de sélectionner la meilleure paramétrisation pour construire des compétences pousser vers la droite et saissir pour reproduire un effet. Ensuite, la paramétrisation sélectionnée est utilisée pour construire un ensemble de compétences diverses, qui sont validées dans plusieurs expériences exécutant des tâches avec différents objets. L’évaluation des compétences construites est directement réalisée sur un Baxter physique.

Autonomous and Online Generation of Skills Inferring Actions Adapted to Low-Level and High-Level Contextual States

Génération autonome et en ligne de compétences déduisant des actions adaptées aux états contextuels de bas niveau et de haut niveau

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager