Data veracity assessment: enhancing Truth Discovery using a priori knowledge

Abstract : The notion of data veracity is increasingly getting attention due to the problem of misinformation and fake news. With more and more published online information it is becoming essential to develop models that automatically evaluate information veracity. Indeed, the task of evaluating data veracity is very difficult for humans. They are affected by confirmation bias that prevents them to objectively evaluate the information reliability. Moreover, the amount of information that is available nowadays makes this task time-consuming. The computational power of computer is required. It is critical to develop methods that are able to automatize this task. In this thesis we focus on Truth Discovery models. These approaches address the data veracity problem when conflicting values about the same properties of real-world entities are provided by multiple sources. They aim to identify which are the true claims among the set of conflicting ones. More precisely, they are unsupervised models that are based on the rationale stating that true information is provided by reliable sources and reliable sources provide true information. The main contribution of this thesis consists in improving Truth Discovery models considering a priori knowledge expressed in ontologies. This knowledge may facilitate the identification of true claims. Two particular aspects of ontologies are considered. First of all, we explore the semantic dependencies that may exist among different values, i.e. the ordering of values through certain conceptual relationships. Indeed, two different values are not necessary conflicting. They may represent the same concept, but with different levels of detail. In order to integrate this kind of knowledge into existing approaches, we use the mathematical models of partial order. Then, we consider recurrent patterns that can be derived from ontologies. This additional information indeed reinforces the confidence in certain values when certain recurrent patterns are observed. In this case, we model recurrent patterns using rules. Experiments that were conducted both on synthetic and real-world datasets show that a priori knowledge enhances existing models and paves the way towards a more reliable information world. Source code as well as synthetic and real-world datasets are freely available.
Document type :
Theses
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/tel-01914278
Contributor : Valentina Beretta <>
Submitted on : Thursday, November 15, 2018 - 5:19:59 PM
Last modification on : Monday, February 11, 2019 - 6:22:02 PM
Document(s) archivé(s) le : Saturday, February 16, 2019 - 12:21:18 PM

File

beretta_thesis_final.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01914278, version 1

Collections

Citation

Valentina Beretta. Data veracity assessment: enhancing Truth Discovery using a priori knowledge. Computer Science [cs]. IMT Mines Alès, 2018. English. ⟨tel-01914278⟩

Share

Metrics

Record views

204

Files downloads

187