How to Measure the Reproducibility of System-oriented IR Experiments

10/26/2020
by   Timo Breuer, et al.
0

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.

READ FULL TEXT
research
01/19/2022

repro_eval: A Python Interface to Reproducibility Measures of System-oriented IR Experiments

In this work we introduce repro_eval - a tool for reactive reproducibili...
research
03/28/2023

A comment to "A General Theory of IR Evaluation Measures"

The paper "A General Theory of IR Evaluation Measures" develops a formal...
research
02/05/2021

Reproducibility in Evolutionary Computation

Experimental studies are prevalent in Evolutionary Computation (EC), and...
research
01/07/2021

Towards Meaningful Statements in IR Evaluation. Mapping Evaluation Measures to Interval Scales

Recently, it was shown that most popular IR measures are not interval-sc...
research
04/01/2022

A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

We present a simple and versatile framework for evaluating ranked lists ...
research
01/12/2019

The Dagstuhl Beginners Guide to Reproducibility for Experimental Networking Research

Reproducibility is one of the key characteristics of good science, but h...
research
11/09/2021

Test cases as a measurement instrument in experimentation

Background: Test suites are frequently used to quantify relevant softwar...

Please sign up or login with your details

Forgot password? Click here to reset