Spatial information and end-to-end learning for visual recognition

Abstract : In this thesis, we present our research on visual recognition and machine learning. Two types of visual recognition problems are investigated: action recognition and human body part segmentation problem. Our objective is to combine spatial information such as label configuration in feature space, or spatial layout of labels into an end-to-end framework to improve recognition performance. For human action recognition, we apply the bag-of-words model and reformulate it as a neural network for end-to-end learning. We propose two algorithms to make use of label configuration in feature space to optimize the codebook. One is based on classical error backpropagation. The codewords are adjusted by using gradient descent algorithm. The other is based on cluster reassignments, where the cluster labels are reassigned for all the feature vectors in a Voronoi diagram. As a result, the codebook is learned in a supervised way. We demonstrate the effectiveness of the proposed algorithms on the standard KTH human action dataset. For human body part segmentation, we treat the segmentation problem as classification problem, where a classifier acts on each pixel. Two machine learning frameworks are adopted: randomized decision forests and convolutional neural networks. We integrate a priori information on the spatial part layout in terms of pairs of labels or pairs of pixels into both frameworks in the training procedure to make the classifier more discriminative, but pixelwise classification is still performed in the testing stage. Three algorithms are proposed: (i) Spatial part layout is integrated into randomized decision forest training procedure; (ii) Spatial pre-training is proposed for the feature learning in the ConvNets; (iii) Spatial learning is proposed in the logistical regression (LR) or multilayer perceptron (MLP) for classification.
Document type :
Complete list of metadatas
Contributor : Abes Star <>
Submitted on : Saturday, March 7, 2015 - 2:43:43 AM
Last modification on : Friday, May 17, 2019 - 10:17:14 AM
Long-term archiving on : Monday, June 8, 2015 - 2:51:25 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01127462, version 1


Mingyuan Jiu. Spatial information and end-to-end learning for visual recognition. Computer Science [cs]. INSA de Lyon, 2014. English. ⟨NNT : 2014ISAL0038⟩. ⟨tel-01127462⟩



Record views


Files downloads