Skip to Main content Skip to Navigation

Hierarchical and temporal analysis of scientific corpora as tools for the history of science

Ian Jeantet 1 
Abstract : This thesis aims to provide automatic analysis of the raw text of scientific publications for quantitative epistemology. The final goal is to produce maps of evolution of scientific domains to help epistemologists to determine the mechanisms that are at stake. We first propose to enrich the insights on the structure of science with a new hierarchical structure called a quasi-dendrogram that can be seen as a specific directed acyclic graph. We propose a framework including a new overlapping hierarchical clustering (OHC) algorithm to generate such hierarchy from the text of scientific papers. One of the major issues was the absence of ground truth. Hence we propose a new similarity measure that compares hierarchies by estimating the matching of same size levels. Finally we propose an alternative method to generate evolutionary maps of scientific domains from a user query. An evolutionary map is defined as a set of timelines determined in following aligned hierarchies from consecutive periods. We defined a probability of evolution that, if used as a threshold, produces more robust evolutionary maps.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, January 4, 2022 - 10:09:21 AM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03108773, version 2


Ian Jeantet. Hierarchical and temporal analysis of scientific corpora as tools for the history of science. Information Retrieval [cs.IR]. Université Rennes 1, 2021. English. ⟨NNT : 2021REN1S048⟩. ⟨tel-03108773v2⟩



Record views


Files downloads