Skip to Main content Skip to Navigation
Theses

Cursive Bengali Script Recognition for Indian Postal Automation

Szilárd Vajda 1
1 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Large variations in writing styles and difficulties in segmenting cursive words are the main reasons for handwritten cursive words recognition for being such a challenging task. An Indian postal document reading system based on a segmentation-free context based stochastic model is presented. The originality of the work resides on a combination of high-level perceptual features with the low-level pixel information considered by the former model and a pruning strategy in the Viterbi decoding to reduce the recognition time. While the low-level information can be easily extracted from the analyzed form, the discriminative power of such information has some limits as describes the shape with less precision. For that reason, we have considered in the framework of an analytical approach, using an implicit segmentation, the implant of high-level information reduced to a lower level. This enrichment can be perceived as a weight at pixel level, assigning an importance to each analyzed pixel based on their perceptual properties. The challenge is to combine the different type of features considering a certain dependence between them. To reduce the decoding time in the Viterbi search, a cumulative threshold mechanism is proposed in a flat lexicon representation. Instead of using a trie representation where the common prefix parts are shared we propose a threshold mechanism in the flat lexicon where based just on a partial Viterbi analysis, we can prune a model and stop the further processing. The cumulative thresholds are based on matching scores calculated at each letter level, allowing a certain dynamic and elasticity to the model. As we are interested in a complete postal address recognition system, we have also focused our attention on digit recognition, proposing different neural and stochastic solutions. To increase the accuracy and robustness of the classifiers a combination scheme is also proposed. The results obtained on different datasets written on Latin and Bengali scripts have shown the interest of the method and the recognition module developed will be integrated in a generic system for the Indian postal automation.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01748429
Contributor : Abdel Belaid <>
Submitted on : Friday, March 25, 2011 - 10:42:48 AM
Last modification on : Friday, May 18, 2018 - 12:21:48 PM
Long-term archiving on: : Sunday, June 26, 2011 - 2:20:08 AM

Identifiers

  • HAL Id : tel-01748429, version 2

Collections

Citation

Szilárd Vajda. Cursive Bengali Script Recognition for Indian Postal Automation. Engineering Sciences [physics]. Université Henri Poincaré - Nancy 1, 2008. English. ⟨NNT : 2008NAN10083⟩. ⟨tel-01748429v2⟩

Share

Metrics

Record views

799

Files downloads

377