Skip to Main content Skip to Navigation
Theses

Détection des fraudes : de l’image à la sémantique du contenu : application à la vérification des informations extraites d’un corpus de tickets de caisse

Abstract : Companies, administrations, and sometimes individuals, have to face many frauds on documents they receive from outside or process internally. Invoices, expense reports, receipts...any document used as proof can be falsified in order to earn more money or not to lose it. In France, losses due to fraud are estimated at several billion euros per year. Since the flow of documents exchanged, whether digital or paper, is very important, it would be extremely costly and time-consuming to have them all checked by fraud detection experts. That’s why we propose in our thesis a system for automatic detection of false documents. While most of the work in automatic document detection focuses on graphic clues, we seek to verify the textual information in the document in order to detect inconsistencies or implausibilities.To do this, we first compiled a corpus of documents that we digitized. After correcting the characters recognition outputs and falsifying part of the documents, we extracted the information and modelled them in an ontology, in order to keep the semantic links between them. The information thus extracted, and increased by its possible disambiguation, can be verified against each other within the document and through the knowledge base established. The semantic links of ontology also make it possible to search for information in other sources of knowledge, particularly on the Internet.
Document type :
Theses
Complete list of metadatas

Cited literature [218 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02318371
Contributor : Abes Star :  Contact
Submitted on : Thursday, October 17, 2019 - 8:55:26 AM
Last modification on : Wednesday, October 14, 2020 - 3:55:16 AM
Long-term archiving on: : Saturday, January 18, 2020 - 12:54:10 PM

File

2019Artaud124158.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02318371, version 1

Collections

Citation

Chloé Artaud. Détection des fraudes : de l’image à la sémantique du contenu : application à la vérification des informations extraites d’un corpus de tickets de caisse. Traitement du texte et du document. Université de La Rochelle, 2019. Français. ⟨NNT : 2019LAROS002⟩. ⟨tel-02318371⟩

Share

Metrics

Record views

358

Files downloads

303