évaluation de la véracité des données : améliorer la découverte de la vérité en utilisant des connaissances a priori

Abstract : The notion of data veracity is increasingly getting attention due to the problem of misinformation and fake news. With more and more published online information it is becoming essential to develop models that automatically evaluate information veracity. Indeed, the task of evaluating data veracity is very difficult for humans. They are affected by confirmation bias that prevents them to objectively evaluate the information reliability. Moreover, the amount of information that is available nowadays makes this task time-consuming. The computational power of computer is required. It is critical to develop methods that are able to automate this task.In this thesis we focus on Truth Discovery models. These approaches address the data veracity problem when conflicting values about the same properties of real-world entities are provided by multiple sources.They aim to identify which are the true claims among the set of conflicting ones. More precisely, they are unsupervised models that are based on the rationale stating that true information is provided by reliable sources and reliable sources provide true information. The main contribution of this thesis consists in improving Truth Discovery models considering a priori knowledge expressed in ontologies. This knowledge may facilitate the identification of true claims. Two particular aspects of ontologies are considered. First of all, we explore the semantic dependencies that may exist among different values, i.e. the ordering of values through certain conceptual relationships. Indeed, two different values are not necessary conflicting. They may represent the same concept, but with different levels of detail. In order to integrate this kind of knowledge into existing approaches, we use the mathematical models of partial order. Then, we consider recurrent patterns that can be derived from ontologies. This additional information indeed reinforces the confidence in certain values when certain recurrent patterns are observed. In this case, we model recurrent patterns using rules. Experiments that were conducted both on synthetic and real-world datasets show that a priori knowledge enhances existing models and paves the way towards a more reliable information world. Source code as well as synthetic and real-world datasets are freely available.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01975534
Contributor : Abes Star <>
Submitted on : Wednesday, January 9, 2019 - 2:12:05 PM
Last modification on : Monday, February 11, 2019 - 6:22:02 PM

File

70711_BERETTA_2018_archivage.p...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01975534, version 1

Collections

Citation

Valentina Beretta. évaluation de la véracité des données : améliorer la découverte de la vérité en utilisant des connaissances a priori. Vision par ordinateur et reconnaissance de formes [cs.CV]. IMT - MINES ALES - IMT - Mines Alès Ecole Mines - Télécom, 2018. Français. ⟨NNT : 2018EMAL0002⟩. ⟨tel-01975534⟩

Share

Metrics

Record views

118

Files downloads

83