Skip to Main content Skip to Navigation

Effective and efficient visual description based on local binary patterns and gradient distribution for object recognition

Chao Zhu 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : This thesis is dedicated to the problem of machine-based visual object recognition, which has become a very popular and important research topic in recent years because of its wide range of applications such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of e orts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. The popular approaches for object recognition nowadays are feature & classifier based, which typically extract visual features from images/videos at first, and then perform the classification using certain machine learning algorithms based on the extracted features. Thus it is important to design good visual description, which should be both discriminative and computationally efficient, while possessing some properties of robustness against the previously mentioned variations. In this context, the objective of this thesis is to propose some innovative contributions for the task of visual object recognition, in particular to present several new visual features / descriptors which effectively and efficiently represent the visual content of images/videos for object recognition. The proposed features / descriptors intend to capture the visual information from different aspects. Firstly, we propose six multi-scale color local binary pattern (LBP) features to deal with the main shortcomings of the original LBP, namely deficiency of color information and sensitivity to non-monotonic lighting condition changes. By extending the original LBP to multi-scale form in different color spaces, the proposed features not only have more discriminative power by obtaining more local information, but also possess certain invariance properties to different lighting condition changes. In addition, their performances are further improved by applying a coarse-to-fine image division strategy for calculating the proposed features within image blocks in order to encode spatial information of texture structures. The proposed features capture global distribution of texture information in images. Secondly, we propose a new dimensionality reduction method for LBP called the orthogonal combination of local binary patterns (OC-LBP), and adopt it to construct a new distribution-based local descriptor by following a way similar to SIFT.Our goal is to build a more efficient local descriptor by replacing the costly gradient information with local texture patterns in the SIFT scheme. As the extension of our first contribution, we also extend the OC-LBP descriptor to different color spaces and propose six color OC-LBP descriptors to enhance the discriminative power and the photometric invariance property of the intensity-based descriptor. The proposed descriptors capture local distribution of texture information in images. Thirdly, we introduce DAISY, a new fast local descriptor based on gradient distribution, to the domain of visual object recognition.
Document type :
Complete list of metadatas

Cited literature [133 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, November 21, 2012 - 4:12:11 PM
Last modification on : Wednesday, July 8, 2020 - 12:43:36 PM
Long-term archiving on: : Friday, February 22, 2013 - 3:49:46 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00755644, version 1


Chao Zhu. Effective and efficient visual description based on local binary patterns and gradient distribution for object recognition. Other. Ecole Centrale de Lyon, 2012. English. ⟨NNT : 2012ECDL0005⟩. ⟨tel-00755644⟩



Record views


Files downloads