Skip to Main content Skip to Navigation

Hierarchical and temporal analysis of scientific corpora as tools for the history of science

Ian Jeantet 1
Abstract : This thesis aims to provide automatic analysis of the raw text of scientific publications for quantitative epistemology. The final goal is to produce maps of evolution of scientific domains to help epistemologists to determine the mechanisms that are at stake. We first propose to enrich the insights on the structure of science with a new hierarchical structure called a quasi-dendrogram that can be seen as a specific directed acyclic graph. We propose a framework including a new overlapping hierarchical clustering (OHC) algorithm to generate such hierarchy from the text of scientific papers. One of the major issues was the absence of ground truth. Hence we propose a new similarity measure that compares hierarchies by estimating the matching of same size levels. Finally we propose an alternative method to generate evolutionary maps of scientific domains from a user query. An evolutionary map is defined as a set of timelines determined in following aligned hierarchies from consecutive periods. We defined a probability of evolution that, if used as a threshold, produces more robust evolutionary maps.
Complete list of metadatas
Contributor : Ian Jeantet <>
Submitted on : Wednesday, January 13, 2021 - 1:35:58 PM
Last modification on : Saturday, January 16, 2021 - 3:27:52 AM


Files produced by the author(s)


  • HAL Id : tel-03108773, version 1


Ian Jeantet. Hierarchical and temporal analysis of scientific corpora as tools for the history of science. Information Retrieval [cs.IR]. Université de Rennes 1 (UR1), 2021. English. ⟨tel-03108773⟩



Record views


Files downloads