Addressing Different Evaluation Environments for Information Retrieval through Pivot Systems

Abstract : Classical evaluations of Information Retrieval systems, under the Cranfield Paradigm, compare several systems within one evaluation environment, defined by its settings (document collection, topics, assessments and evaluation measures). In this paper, we propose a framework to handle the comparison of systems across several evaluation environments. To achieve this goal, we investigate the use of pivot systems, allowing an indirect comparison of systems across evaluation environments by computing Result Deltas, i.e. the differences between their evaluation measures values. We detail the proposed pivot-based methodology, define a pivot characteristics and present experiments to validate our proposal (and in particular the pivot characteristics). We create altered environments that differ from their topic sets using the 2018 and 2020 CLEF eHealth evaluation campaigns (Goeuriot et al., 2020). We explore the behaviour of the metrics and pivots measuring the correlation between the result deltas, and the ranking of systems through the pivots compared to the official ranking of the systems. Our experiment show that correlations can greatly vary according to the chosen pivot and metric. We show that some pivot/metric pairs achieve high correlation values across the altered environments, with a ranking of systems similar to the official ranking.
Conference papers
Contributor : Philippe Mulhem Connect in order to contact the contributor
Submitted on : Monday, October 25, 2021 - 9:46:38 AM
Last modification on : Wednesday, November 17, 2021 - 1:57:26 PM


Files produced by the author(s)




Gabriela González Sáez, Lorraine Goeuriot, Philippe Mulhem. Addressing Different Evaluation Environments for Information Retrieval through Pivot Systems. CORIA 2021, Apr 2021, Grenoble (virtuel), France. ⟨10.24348/coria.2021.long_6⟩. ⟨hal-03400576⟩



