Skip to Main content Skip to Navigation

A framework for the continuous curation of a knowledge base system

Abstract : Entity-centric knowledge graphs (KGs) are becoming increasingly popular for gathering information about entities. The schemas of KGs are semantically rich, with many different types and predicates to define the entities and their relationships. These KGs contain knowledge that requires understanding of the KG’s structure and patterns to be exploited. Their rich data structure can express entities with semantic types and relationships, oftentimes domain-specific, that must be made explicit and understood to get the most out of the data. Although different applications can benefit from such rich structure, this comes at a price. A significant challenge with KGs is the quality of their data. Without high-quality data, the applications cannot use the KG. However, as a result of the automatic creation and update of KGs, there are a lot of noisy and inconsistent data in them and, because of the large number of triples in a KG, manual validation is impossible. In this thesis, we present different tools that can be utilized in the process of continuous creation and curation of KGs. We first present an approach designed to create a KG in the accounting field by matching entities. We then introduce methods for the continuous curation of KGs. We present an algorithm for conditional rule mining and apply it on large graphs. Next, we describe RuleHub, an extensible corpus of rules for public KGs which provides functionalities for the archival and the retrieval of rules. We also report methods for using logical rules in two different applications: teaching soft rules to pre-trained language models (RuleBert) and explainable fact checking (ExpClaim).
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, February 7, 2022 - 12:59:08 PM
Last modification on : Wednesday, February 9, 2022 - 3:46:03 AM
Long-term archiving on: : Sunday, May 8, 2022 - 6:40:36 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03560070, version 1


Naser Ahmadi. A framework for the continuous curation of a knowledge base system. Logic in Computer Science [cs.LO]. Sorbonne Université, 2021. English. ⟨NNT : 2021SORUS320⟩. ⟨tel-03560070⟩



Record views


Files downloads