Autonomous learning in a neuromorphic vision system : an active efficient coding spiking model applied to vision

Thomas Barbier

Résumé

Biological systems continuously adapt their neural representations to the statistics of their sensory input signals to operate efficiently. However, those statistics are shaped directly by the organism’s behavior when sampling the environment. In the case of vision, the organism must therefore solve a complex problem of jointly learning visual encoding and eye control without external supervision. This autonomous joint learning of visual representations and actions has been previously modeled in the Active Efficient Coding (AEC) framework and implemented using traditional frame-based cameras as visual sensory inputs. This type of sensor is very well studied and used in many visual applications. Nevertheless, its performance in terms of the acquisition rate, dynamic range, or power consumption remains far from the capability of biological vision systems. Event-based cameras are a new type of vision sensor. Based on the mammalian eyes, they imitate the early visual pathways, such as the retina. Each pixel unit is independent and emits a signal when it detects a high enough change in light intensity. It operates at a very short timescale (the order of a few microseconds), allowing it to capture swift movements with high precision. A static scene will not create visual feedback, thus avoiding redundant information. Finally, the asynchronous nature of the sensor allows it to drastically reduce the power consumption (a few milliwatts) and increase the dynamic range (more than 120 dB). All those features come with a challenge; the asynchronous output of event-based cameras is not well suited to be used with conventional computer vision. Many popular algorithms, such as artificial neural networks, depend on discrete operations, making them incompatible with this new type of sensor. Spiking Neural Network (SNN)s are bio-inspired networks that try to reproduce the computations of biological neuronal systems. They are especially well suited to be used with event-based cameras. Building on the AEC framework and using those novel event-based sensors as sensory input, we want to create a system capable of learning smooth eye pursuit and vergence based on efficient coding representations of its environment. Our model is composed of 3 main blocks. The first stage comprises the sensory inputs, carried out by a stereoscopic pair of event-based cameras mounted on a moving robotic head. The second stage comprises a two-layer SNN, which encodes the sensory inputs into an efficient visual representation. This visual representation is fed into the third stage, composed of a spiking reinforcement learner. This stage is responsible for learning motor commands to maximize a reward signal. This reward signal is computed directly from the activity levels of the efficient coding layer, which is modulated by plastic inhibitory connections learned on specific visual patterns. Our work on the second stage has been extensively described in [1] and [2]. The spiking neural network is capable of learning orientation, disparity, and motion representations and efficiently tuning to the statistics of the scene using a modified Spike-Timing Dependent Plasticity (STDP) rule. It is composed of two layers, the simple and complex cells, directly inspired by neurons found in the early mammalianvisual pathways.

Les systèmes biologiques adaptent continuellement leurs représentations neuronalesaux statistiques de leurs signaux d’entrée sensoriels pour fonctionner efficacement. Cependant, ces statistiques sont façonnées directement par le comportement de l’organisme lorsqu’il échantillonne l’environnement. Dans le cas de la vision, l’organisme doit donc résoudre un problème complexe d’apprentissage conjoint du codage visuel et du contrôle oculaire sans supervision externe. Cet apprentissage autonome conjoint des représentations visuelles et des actions a été précédemment modélisé dans le cadre Active Efficient Coding (AEC) et mis en œuvre en utilisant des caméras traditionnelles basées sur les images comme entrées sensorielles visuelles. Ce type de capteur est très bien étudié et utilisé dans de nombreuses applications visuelles. Néanmoins, ses performances en termes de taux d’acquisition, de plage dynamique ou de consommation d’énergie restent loin des capacités des systèmes de vision biologiques. Les caméras événementielles constituent un nouveau type de capteur de vision. Basées sur les yeux des mammifères, elles imitent les premières voies visuelles, telles que la rétine. Chaque pixel est indépendant et émet un signal lorsqu’elle détecte une variation suffisamment importante de l’intensité lumineuse. Cependant la sortie asynchrone des caméras événementielles n’est pas adaptée à la vision par ordinateur conventionnelle. De nombreux algorithmes populaires, tels que les réseaux neuronaux artificiels, dépendent d’opérations discrètes, ce qui les rend incompatibles avec ce nouveau type de capteur. Ce sont des réseaux bio-inspirés qui tentent de reproduire les calculs des systèmes neuronaux biologiques. Ils sont particulièrement bien adaptés pour être utilisés avec des caméras événementielles.En se basant sur le cadre AEC et en utilisant ces nouveaux capteurs événementiels comme entrée sensorielle, nous voulons créer un système capable d’apprendre la pour-suite oculaire et la vergence en douceur en se basant sur des représentations codantes efficaces de son environnement.

Autonomous learning in a neuromorphic vision system : an active efficient coding spiking model applied to vision

Apprentissage autonome dans un système de vision neuromorphique : un modèle de codage impulsionnel actif et efficace appliqué à la vision

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager