Evaluating Computational Models of Vision with Functional Magnetic Resonance Imaging

Abstract : Blood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) makes it possible to measure brain activity through blood flow to areas with metabolically active neurons. In this thesis we use these measurements to evaluate the capacity of biologically inspired models of vision coming from computer vision to represent image content in a similar way as the human brain. The main vision models used are convolutional networks.Deep neural networks have made unprecedented progress in many fields in recent years. Even strongholds of biological systems such as scene analysis and object detection have been addressed with enormous success. A body of prior work has been able to establish firm links between the first and last layers of deep convolutional nets and brain regions: The first layer and V1 essentially perform edge detection and the last layer as well as inferotemporal cortex permit a linear read-out of object category. In this work we have generalized this correspondence to all intermediate layers of a convolutional net. We found that each layer of a convnet maps to a stage of processing along the ventral stream, following the hierarchy of biological processing: Along the ventral stream we observe a stage-by-stage increase in complexity. Between edge detection and object detection, for the first time we are given a toolbox to study the intermediate processing steps.A preliminary result to this was obtained by studying the response of the visual areas to presentation of visual textures and analysing it using convolutional scattering networks.The other global aspect of this thesis is “decoding” models: In the preceding part, we predicted brain activity from the stimulus presented (this is called “encoding”). Predicting a stimulus from brain activity is the inverse inference mechanism and can be used as an omnibus test for presence of this information in brain signal. Most often generalized linear models such as linear or logistic regression or SVMs are used for this task, giving access to a coefficient vector the same size as a brain sample, which can thus be visualized as a brain map. However, interpretation of these maps is difficult, because the underlying linear system is either ill-defined and ill-conditioned or non-adequately regularized, resulting in non-informative maps. Supposing a sparse and spatially contiguous organization of coefficient maps, we build on the convex penalty consisting of the sum of total variation (TV) seminorm and L1 norm (“TV+L1”) to develop a penalty grouping an activation term with a spatial derivative. This penalty sets most coefficients to zero but permits free smooth variations in active zones, as opposed to TV+L1 which creates flat active zones. This method improves interpretability of brain maps obtained through cross-validation to determine the best hyperparameter.In the context of encoding and decoding models, we also work on improving data preprocessing in order to obtain the best performance. We study the impulse response of the BOLD signal: the hemodynamic response function. To generate activation maps, instead of using a classical linear model with fixed canonical response function, we use a bilinear model with spatially variable hemodynamic response (but fixed across events). We propose an efficient optimization algorithm and show a gain in predictive capacity for encoding and decoding models on different datasets.
Complete list of metadatas

Contributor : Abes Star <>
Submitted on : Wednesday, March 23, 2016 - 5:12:06 PM
Last modification on : Monday, February 10, 2020 - 6:13:43 PM
Long-term archiving on: Friday, June 24, 2016 - 1:48:11 PM


  • HAL Id : tel-01292787, version 1


Michael Eickenberg. Evaluating Computational Models of Vision with Functional Magnetic Resonance Imaging. Computer Vision and Pattern Recognition [cs.CV]. Université Paris Sud - Paris XI, 2015. English. ⟨NNT : 2015PA112206⟩. ⟨tel-01292787⟩



Record views


Files downloads