Enhancing Human-Robot Interaction with Computer Vision

Yuming Du

Résumé

In recent years, the field of robotics has undergone substantial progress, particularly in the pursuit of creating robots capable of effortlessly interacting with humans in intricate environments. Central to this aim is the need for robots to comprehend their surroundings, foresee human actions, and adjust their movements in response. In this thesis, we explore the challenge of improving a robot's abilities in scene understanding, human motion forecasting and synthesis, and the application of acquired human motion knowledge to robot movements.In the opening segment of this thesis, we explore the enhancement of a robot's ability to understand its surroundings on multiple levels. We introduce an innovative framework that enables robots to autonomously identify and segment objects in open-world environments using self-training (published at IEEE ICCV2021). By leveraging deep learning techniques, our approaches enable robots to efficiently learn from their surroundings and recognize previously unseen objects. In addition to object discovery, we introduce a method to improve the robot's monocular depth estimation capabilities (published at IEEE CVPR2020). This enhancement further refines the robot's understanding of its environment by providing a more accurate representation of depth information from a single-camera viewpoint. Together, these advancements strengthen the robot's adaptability and performance in navigating complex and dynamic situations.In the second part, we focus on improving the robot's ability to understand and predict human motion. We present two distinct methods that investigate either historical human motions(published at IEEE WACV2023) or observed partial joint movements(published at IEEE CVPR2023) as a basis for accurately anticipating human motion. These enhancements enable robots to collaborate more effectively with humans by foreseeing their motions, thereby playing a crucial role in fostering safe and efficient human-robot interactions.In the third and concluding part of this thesis, we tackle the challenge of converting learned human motion into robot movements. We introduce a method designed to adapt human grasp demonstrations for use with any multi-fingered grippers, allowing robots to intuitively and effectively manipulate objects(published at IEEE IROS 2022). By integrating kinematic mapping and optimization techniques, our approach guarantees that the adapted grasps are both physically viable and resilient, empowering robots to carry out intricate manipulation tasks in environments centered around human interaction.By combining and integrating the three components proposed in this thesis, we seek to substantially enhance a robot's capacity to interact with humans in complex environments. Our proposed enhancements, including improved scene understanding, accurate human motion prediction, and effective grasp adaptation capabilities, are aimed at empowering robots to engage in more seamless, safe, and efficient human-robot collaborations across various domains and applications. Collectively, these advancements lay the foundation for developing more sophisticated and intuitive robotic systems that can adapt to the dynamic and evolving nature of human environments.

Ces dernières années, le domaine de la robotique a connu des progrès considérables, notamment dans la recherche visant à créer des robots capables d'interagir sans effort avec les humains dans des environnements complexes. Au cœur de cet objectif se trouve la nécessité pour les robots de comprendre leur environnement, d'anticiper les actions humaines et d'ajuster leurs mouvements en conséquence. Dans cette thèse, nous explorons le défi d'améliorer les capacités d'un robot en matière de compréhension de scène, de prévision et de synthèse du mouvement humain, et de l'application des connaissances acquises sur le mouvement humain aux mouvements des robots. Dans le premier segment de cette thèse, nous explorons l'amélioration de la capacité d'un robot à comprendre son environnement à plusieurs niveaux. Nous présentons un cadre novateur qui permet aux robots d'identifier et de segmenter les objets de manière autonome dans des environnements ouverts en utilisant l'autoformation (publié à l'IEEE ICCV2021). En tirant parti des techniques d'apprentissage profond, nos approches permettent aux robots d'apprendre efficacement de leur environnement et de reconnaître des objets jamais vus auparavant. En plus de la découverte d'objets, nous introduisons une méthode pour améliorer les capacités d'estimation de profondeur monoculaire du robot (publié à l'IEEE CVPR2020). Cette amélioration affine davantage la compréhension de l'environnement par le robot en fournissant une représentation plus précise des informations de profondeur à partir d'un point de vue à une seule caméra. Ensemble, ces avancées renforcent l'adaptabilité et les performances du robot dans la navigation de situations complexes et dynamiques. Dans la deuxième partie, nous nous concentrons sur l'amélioration de la capacité du robot à comprendre et à prédire le mouvement humain. Nous présentons deux méthodes distinctes qui étudient soit les mouvements humains historiques (publiés à l'IEEE WACV2023), soit les mouvements articulaires partiels observés (publiés à l'IEEE CVPR2023) comme base pour anticiper avec précision le mouvement humain. Ces améliorations permettent aux robots de collaborer plus efficacement avec les humains en prévoyant leurs mouvements, jouant ainsi un rôle crucial dans la promotion d'interactions sûres et efficaces entre humains et robots. Dans la troisième et dernière partie de cette thèse, nous abordons le défi de convertir les mouvements humains appris en mouvements de robots. Nous introduisons une méthode conçue pour adapter les démonstrations de préhension humaine à l'utilisation avec n'importe quel préhenseur à plusieurs doigts, permettant aux robots de manipuler intuitivement et efficacement les objets (publié à l'IEEE IROS 2022). En intégrant les techniques de cartographie cinématique et d'optimisation, notre approche garantit que les préhensions adaptées sont à la fois physiquement viables et résilientes, permettant aux robots d'effectuer des tâches de manipulation complexes dans des environnements axés sur l'interaction humaine. En combinant et en intégrant les trois composants proposés dans cette thèse, nous cherchons à améliorer considérablement la capacité d'un robot à interagir avec les humains dans des environnements complexes. Nos améliorations proposées, notamment une meilleure compréhension de la scène, une prédiction précise du mouvement humain et des capacités d'adaptation de préhension efficaces, visent à donner aux robots la possibilité de s'engager dans des collaborations homme-robot plus fluides, sûres et efficaces dans divers domaines et applications. Collectivement, ces avancées posent les bases pour le développement de systèmes robotiques plus sophistiqués et intuitifs capables de s'adapter à la nature dynamique et en constante évolution des environnements humains.

Enhancing Human-Robot Interaction with Computer Vision

Améliorer l'interaction homme-robot avec la vision par ordinateur

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Lien texte intégral

Citer

Exporter

Collections

Partager