Skip to Main content Skip to Navigation
Theses

Linear Combination of multiresolution descriptors: Application to Graphics Recognition

Oriol Ramos Terrades 1
1 LANGUE ET DIALOGUE - Human-machine dialogue with a significant language component
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In the field of Document Analysis we would like to be able to automatically process any kind of digital document. We mean extracting the document layout and identifying each of its parts, recognising its contents and organising them in order to make searches of its components, through the document itself, but also through different documents. This is a challenger problem that has motivated different lines of research in the field of Document Analysis at different levels: Pre-processing techniques have been developed to upgrade the quality of the document image, reducing noise from the input devices and minimizing the effects of the degradation of documents. A deep study in segmentation has been carried out in order to separate the regions of interest from the document background. Finally, many descriptors have been proposed for representing and identifying these regions of interest since the end of 60s until now.

In this thesis, we have focused on, this last problem, the shape description description and also on classifier fusion, to apply them to one of the application fields in the Document Analysis: the graphics recognition. In shape recognition, many applications have to face the problem of describing a large number of complex shapes for recognition or retrieval in large databases. Besides the large number of shapes, we can find other challenges for shape description, such as the similarity among some of the shapes or the variability of the shape classes. In these cases, one of the key issues is the design of highly discriminant shape descriptors. Unfortunately, one kind of descriptor is not usually enough to achieve satisfactory results and hence, we have to combine the information from different sources to improve the global performance of the recognition system. We have carried out this combination of information using classifier fusion.

Concerning shape description, traditionally graphics have been represented using structural descriptors, which are based on a vectorial representation of the shape. Vectorization is quite sensitive to noise and to distortions of sketched symbols. We can try to overcome this problem using grammar descriptors or deformable models of shapes. Another possibility, which is the followed in this dissertation, is to propose descriptors that do not need a vectorial representation of the symbol. Thereby, in the context of shape description, we have proposed a descriptor based on the ridgelets transform which, thanks to we have unified the terminology used in shape description and the introduced vocabulary, we can define as: 2D, polar and multi-resolution descriptor information preserving and invariant to similarities. On the other hand, although ridgelets descriptor can be considered as a single descriptor, it offers a shape representation divided into groups of coefficients, which permit us to consider them as single descriptors. Thus, for each descriptor, we have trained a classifier and we have proposed two linear combination rules, IN and DN, that minimize the classification error of classifiers verifying a set of constraints concerning the dependence and the distribtuion of classifers.


These theoretical approaches have been evaluated through an experimental evaluation in ridgelets descriptors, classifier fusion and applying the classifier fusion methods to ridge lets descriptors, obtaining the following results: Ridgelets descriptors have proven to represent graphics symbols better than general purpose descriptors. IN and DN methods reduce the misclassification rates regarding other reference fusion methods. Finally, the IN method applied to ridgelets descriptor, in combination of boosting algorithms, has reached recognition rates near to 100% in the test defined for the GREC'03 database.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-00109597
Contributor : Oriol Ramos Terrades <>
Submitted on : Tuesday, October 24, 2006 - 8:35:21 PM
Last modification on : Friday, February 26, 2021 - 3:28:03 PM
Long-term archiving on: : Thursday, September 20, 2012 - 12:20:45 PM

Identifiers

  • HAL Id : tel-00109597, version 1

Collections

Citation

Oriol Ramos Terrades. Linear Combination of multiresolution descriptors: Application to Graphics Recognition. Human-Computer Interaction [cs.HC]. Université Nancy II, 2006. English. ⟨tel-00109597⟩

Share

Metrics

Record views

299

Files downloads

556