Skip to Main content Skip to Navigation
Theses

Synthèse de comportements par apprentissages par renforcement parallèles : application à la commande d'un micromanipulateur plan

Abstract : In the microrobotics field, the control of systems is difficult because the physical phenomena connected to the microscopic scale are complex. The reinforcement learning methods constitute an interesting approach because they allow to draw up a control policy without any knowledge of the system. With regard to the large dimensions of the state spaces of the studied systems, we developed a parallel approach which is inspired by the behaviour-based architectures and by the reinforcement learning. This architecture is based on parallel Q-Learning algorithms. It allows to reduce the system complexity and to speed up the learning process. On the gridworld example, the results are good but the learning time is too long to control a real system. Then, the Q-Learning algorithm was replaced by the Dyna-Q algorithm which we adapted to the control of no deterministic systems by using a chronological account of the last transitions. This architecture, called parallel Dyna-Q, allows to increase the convergence speed and also to find better control policies. The experiments done with the real manipulation system show that the learning is possible in real time without no need of simulations. The behaviours co-ordination function works well if the obstacles are separated from each others. If that is not case, it can create local maxima which trap temporarily the system in a cycle. So, we developed another co-ordination function which creates a more global model of the system from the model of transition built with the Dyna-Q algorithm. This new co-ordination function allows to go out of local maxima if the temporal pattern matching function used by the architecture is sturdy.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00008761
Contributor : Guillaume Laurent <>
Submitted on : Monday, March 14, 2005 - 8:34:19 AM
Last modification on : Thursday, November 12, 2020 - 9:42:34 AM
Long-term archiving on: : Friday, April 2, 2010 - 9:34:56 PM

Identifiers

  • HAL Id : tel-00008761, version 1

Collections

Citation

Guillaume Laurent. Synthèse de comportements par apprentissages par renforcement parallèles : application à la commande d'un micromanipulateur plan. Automatique / Robotique. Université de Franche-Comté, 2002. Français. ⟨tel-00008761⟩

Share

Metrics

Record views

281

Files downloads

109