Quantified Reproducibility Assessment of NLP Results

04/12/2022
by   Anya Belz, et al.
0

This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts and definitions from metrology. QRA produces a single score estimating the degree of reproducibility of a given system and evaluation measure, on the basis of the scores from, and differences between, different reproductions. We test QRA on 18 system and evaluation measure combinations (involving diverse NLP tasks and types of evaluation), for each of which we have the original results and one to seven reproduction results. The proposed QRA method produces degree-of-reproducibility scores that are comparable across multiple reproductions not only of the same, but of different original studies. We find that the proposed method facilitates insights into causes of variation between reproductions, and allows conclusions to be drawn about what changes to system and/or evaluation design might lead to improved reproducibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2021

Quantifying Reproducibility in NLP and ML

Reproducibility has become an intensely debated topic in NLP and ML over...
research
06/16/2023

Reproducibility in NLP: What Have We Learned from the Checklist?

Scientific progress in NLP rests on the reproducibility of researchers' ...
research
02/05/2021

Reproducibility in Evolutionary Computation

Experimental studies are prevalent in Evolutionary Computation (EC), and...
research
03/14/2021

A Systematic Review of Reproducibility Research in Natural Language Processing

Against the background of what has been termed a reproducibility crisis ...
research
05/24/2022

Generative Models for Reproducible Coronary Calcium Scoring

Purpose: Coronary artery calcium (CAC) score, i.e. the amount of CAC qua...
research
06/03/2022

[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement

Combating bias in NLP requires bias measurement. Bias measurement is alm...
research
02/08/2023

Towards Inferential Reproducibility of Machine Learning Research

Reliability of machine learning evaluation – the consistency of observed...

Please sign up or login with your details

Forgot password? Click here to reset