Skip to Main content Skip to Navigation
Theses

La restructuration des documents graphiques destructurés

Abstract : This thesis deals with the restructuring of unstructured PDF documents containing graphical elements such as schematics, plans and drawings, with the aim of restructuring them. Using the KDD (Knowledge Discovery in Database) method for data restructuring, we introduce the (A) KDD (Antropocentric Knowledge Discovery in Database) method that we developed which is derived from the KDD method by adding an incremental aspect and an user-centered approach. We present, in particular, a technique based on on the bucket sort algorithm pattern in order to extract with efficiency graphic symbols contained in a PDF file. It is compared to the results obtained by Puglissi on strings. Then, we formulate the hypothesis:”taking into account the chronological order present in the PDF files in the incremental process improves the restructuring of the documents”. We illustrate the validity of this hypothesis on several examples. Finally, we show the efficiency of the process in the identification of the symbols at the same time as the equipotentials. The thesis concludes by showing the advances and the limits of the solution of the (A) KDD method and we propose some perspectives.
Complete list of metadatas

Cited literature [147 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02453457
Contributor : Abes Star :  Contact
Submitted on : Thursday, January 23, 2020 - 7:01:08 PM
Last modification on : Friday, January 24, 2020 - 2:00:31 AM
Long-term archiving on: : Friday, April 24, 2020 - 5:09:51 PM

File

PERE-LAPERNE_JACQUES_2019.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02453457, version 1

Collections

Citation

Jacques Pere-Laperne. La restructuration des documents graphiques destructurés. Traitement du texte et du document. Université de Bordeaux, 2019. Français. ⟨NNT : 2019BORD0226⟩. ⟨tel-02453457⟩

Share

Metrics

Record views

187

Files downloads

58