Extraction de structures de documents par champs aléatoires conditionnels : application aux traitements des courriers manuscrits

Abstract : The automatic processing of written documents is a very active field in the industry. Indeed, due to the mass of written documents to process, the automatic analysis becomes a necessity, but the performance of current systems is highly variable according to the types of documents processed. For example, treatment of unconstrained handwritten documents remains an unresolved issue because two technological obstacles that hinder the development of reliable automatic processing of handwritten documents: - the first is the recognition of handwritten in those documents - the second is related to the existence of widely variability in the documents structures. This thesis focuses on solving the second bolt in the case of unconstrained handwritten documents. For this, we have developed reliable and robust methods to analyze document structures based on the use of Conditional Random Fields. The choice of Conditional Random Fields is motivated by the ability of these graphical models to take into account the relationships between the various entities of the document (words, phrases, blocks, ...) and integrate contextual knowledge. In addition, the use of probabilistic modeling gifted learning overcomes the inherent variability of the documents to be processed. The originality of the thesis also addresses the proposal of a hierarchical approach for extracting joint physical (segmentation of the document into blocks, lines, ...) and logical (functional interpretation of the physical structure) structures by combining low-level physical features (position, graphic, ...) and high-level logical (keyword spotting). The experiments carried out on handwritten letters show that the proposed model represents an interesting solution because of its discriminatory character and his natural ability to integrate and contextualize the characteristics of different kinds.
Document type :
Theses
Document and Text Processing. Université de Rouen, 2011. French


https://tel.archives-ouvertes.fr/tel-00652301
Contributor : Florent Montreuil <>
Submitted on : Thursday, December 15, 2011 - 11:39:05 AM
Last modification on : Thursday, December 15, 2011 - 11:43:57 AM

Identifiers

  • HAL Id : tel-00652301, version 1

Collections

Citation

Florent Montreuil. Extraction de structures de documents par champs aléatoires conditionnels : application aux traitements des courriers manuscrits. Document and Text Processing. Université de Rouen, 2011. French. <tel-00652301>

Export

Share

Metrics

Consultation de
la notice

133

Téléchargement du document

133