Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

03/26/2021
by   Curtis G. Northcutt, et al.
2

We algorithmically identify label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets, and subsequently study the potential for these label errors to affect benchmark results. Errors in test sets are numerous and widespread: we estimate an average of 3.4 errors comprise 6 found using confident learning and then human-validated via crowdsourcing (54 of the algorithmically-flagged candidates are indeed erroneously labeled). Surprisingly, we find that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data. For example, on ImageNet with corrected labels: ResNet-18 outperforms ResNet-50 if the prevalence of originally mislabeled test examples increases by just 6 outperforms VGG-19 if the prevalence of originally mislabeled test examples increases by 5 based on test accuracy – our findings advise caution here, proposing that judging models over correctly labeled test sets may be more useful, especially for noisy real-world datasets.

READ FULL TEXT

page 2

page 7

research
03/13/2023

Identifying Label Errors in Object Detection Datasets by Loss Inspection

Labeling datasets for supervised object detection is a dull and time-con...
research
09/12/2017

Learning with Bounded Instance- and Label-dependent Label Noise

Instance- and label-dependent label noise (ILN) is widely existed in rea...
research
01/25/2022

GMM Discriminant Analysis with Noisy Label for Each Class

Real world datasets often contain noisy labels, and learning from such d...
research
08/14/2023

Channel-Wise Contrastive Learning for Learning with Noisy Labels

In real-world datasets, noisy labels are pervasive. The challenge of lea...
research
11/25/2022

Identifying Incorrect Annotations in Multi-Label Classification Data

In multi-label classification, each example in a dataset may be annotate...
research
01/11/2023

Does progress on ImageNet transfer to real-world datasets?

Does progress on ImageNet transfer to real-world datasets? We investigat...
research
08/30/2021

Evaluating Bayes Error Estimators on Read-World Datasets with FeeBee

The Bayes error rate (BER) is a fundamental concept in machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset