Skip to Main content Skip to Navigation

Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources

Abstract : In biomedicine, the explosion of textual knowledge sources has introduced formidable challenges for knowledge-aware information systems. Traditional knowledge acquisition methods have been proved costly, resource intensive and time consuming. Automation of large scale knowledge acquisition systems requires narrowing down the semantic gap between biomedical texts and structured representations. In this context, this study proposes a knowledge acquisition framework from biomedical texts. This contributes towards reducing efforts, time and cost incurred to minimaize ontology acquisition bottlenecks. The proposed framework approximates, models, structures and ontologizes implicit knowledge buried in biomedical texts. In the framework, the semantic disambiguator approximates biomedical artefacts from biomedical texts. The conceptual disambiguator models and structures the biomedical knowledge abstracted from the domain texts. Ontologization presents an explicit interpretation of biomedical artefacts and conceptualizations. The components of the framework are instantiated with scientific and clinical text documents and produced about four million concepts and seven million associations. This set of artefacts is structured into the lower ontological knowledge structure where the upper ontology structure is reused from existing ones. The conceptual structure is represented with graph formalism. The formal interpretation is based on OWL DL language primitives and constructs, which generates a set of OWL DL axioms. The set of OWL DL axioms is referred as the OWL ontology (Ko). The extent of approximation and quality of structural design are evaluated using criteria-based methods. A set of metrics is used to measure each criterion and showed encouraging results. Correctness measurements for concept entity are 70% for accuracy, 82% for completeness, 68% for conciseness and 100% for consistency. Quality measurement showed complex ontology structure with metrics values of 986,448 for vocabulary size, 18.73 for connectivity density, 145,246 for tree impurity and 226, 698 for graph entropy. The ontology schema potential metrics values are also 0.80 for relationship richness, 3 for attribute richness and 13,253 for inheritance richness. Ontology clarity showed an average readability, which is 3 attributes on average. The proposed framework has limitations to address the acquisition of individuals and entity attributes, losing cardinality information in the acquisition of the ontological knowledge. These lead to limitations on the formal interpretation of biomedical semantics, which in turn lead to deploy only existential restriction based interpretations. Thus, a way forward has been recommended to enhance semantic disambiguation and ontologization of the proposed framework so that they enable to accommodate the acquisition of cardinality and attribute information.
Document type :
Complete list of metadatas

Cited literature [335 references]  Display  Hide  Download
Contributor : Georges Quénot <>
Submitted on : Friday, October 18, 2019 - 8:21:56 AM
Last modification on : Tuesday, October 6, 2020 - 4:20:08 PM
Long-term archiving on: : Sunday, January 19, 2020 - 12:41:15 PM


Files produced by the author(s)


  • HAL Id : tel-02087577, version 1



Demeke Asres Ayele. Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources. Information Retrieval [cs.IR]. Université d'Addis Abeba, 2016. English. ⟨tel-02087577⟩



Record views


Files downloads