Skip to Main content Skip to Navigation

La restructuration des documents graphiques destructurés

Abstract : This thesis deals with the restructuring of unstructured PDF documents containing graphical elements such as schematics, plans and drawings, with the aim of restructuring them. Using the KDD (Knowledge Discovery in Database) method for data restructuring, we introduce the (A) KDD (Antropocentric Knowledge Discovery in Database) method that we developed which is derived from the KDD method by adding an incremental aspect and an user-centered approach. We present, in particular, a technique based on on the bucket sort algorithm pattern in order to extract with efficiency graphic symbols contained in a PDF file. It is compared to the results obtained by Puglissi on strings. Then, we formulate the hypothesis:”taking into account the chronological order present in the PDF files in the incremental process improves the restructuring of the documents”. We illustrate the validity of this hypothesis on several examples. Finally, we show the efficiency of the process in the identification of the symbols at the same time as the equipotentials. The thesis concludes by showing the advances and the limits of the solution of the (A) KDD method and we propose some perspectives.
Complete list of metadatas

Cited literature [147 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, January 23, 2020 - 7:01:08 PM
Last modification on : Friday, January 24, 2020 - 2:00:31 AM
Long-term archiving on: : Friday, April 24, 2020 - 5:09:51 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02453457, version 1



Jacques Pere-Laperne. La restructuration des documents graphiques destructurés. Traitement du texte et du document. Université de Bordeaux, 2019. Français. ⟨NNT : 2019BORD0226⟩. ⟨tel-02453457⟩



Record views


Files downloads