Identifying Statistical Bias in Dataset Replication

05/19/2020
by   Logan Engstrom, et al.
10

Dataset replication is a useful tool for assessing whether improvements in test accuracy on a specific benchmark correspond to improvements in models' ability to generalize reliably. In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14 human-in-the-loop measure of data quality. We show that after correcting for the identified statistical bias, only an estimated 3.6%± 1.5% of the original 11.7%± 1.0% accuracy drop remains unaccounted for. We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication. Code for our study is publicly available at http://github.com/MadryLab/dataset-replication-analysis .

READ FULL TEXT

page 3

page 19

page 22

page 29

page 30

research
06/11/2021

Cross-replication Reliability – An Empirical Approach to Interpreting Inter-rater Reliability

We present a new approach to interpreting IRR that is empirical and cont...
research
03/20/2019

Statistical Methods for Replicability Assessment

Large-scale replication studies like the Reproducibility Project: Psycho...
research
01/15/2018

Conceptualizing and Evaluating Replication Across Domains of Behavioral Research

We discuss the authors' conceptualization of replication, in particular ...
research
06/13/2018

Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining

The use of machine learning techniques has expanded in education researc...
research
02/23/2022

When do GANs replicate? On the choice of dataset size

Do GANs replicate training images? Previous studies have shown that GANs...
research
11/21/2017

Why "Redefining Statistical Significance" Will Not Improve Reproducibility and Could Make the Replication Crisis Worse

A recent proposal to "redefine statistical significance" (Benjamin, et a...

Please sign up or login with your details

Forgot password? Click here to reset