CTRL: Clustering Training Losses for Label Error Detection

08/17/2022
by   Chang Yue, et al.
0

In supervised machine learning, use of correct labels is extremely important to ensure high accuracy. Unfortunately, most datasets contain corrupted labels. Machine learning models trained on such datasets do not generalize well. Thus, detecting their label errors can significantly increase their efficacy. We propose a novel framework, called CTRL (Clustering TRaining Losses for label error detection), to detect label errors in multi-class datasets. It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways. First, we train a neural network using the noisy training dataset and obtain the loss curve for each sample. Then, we apply clustering algorithms to the training losses to group samples into two categories: cleanly-labeled and noisily-labeled. After label error detection, we remove samples with noisy labels and retrain the model. Our experimental results demonstrate state-of-the-art error detection accuracy on both image (CIFAR-10 and CIFAR-100) and tabular datasets under simulated noise. We also use a theoretical analysis to provide insights into why CTRL performs so well.

READ FULL TEXT
research
01/26/2022

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

Acquiring accurate labels on large-scale datasets is both time consuming...
research
12/01/2021

Investigation of Training Label Error Impact on RNN-T

In this paper, we propose an approach to quantitatively analyze impacts ...
research
01/09/2018

Robust Propensity Score Computation Method based on Machine Learning with Label-corrupted Data

In biostatistics, propensity score is a common approach to analyze the i...
research
08/27/2023

Label Denoising through Cross-Model Agreement

Learning from corrupted labels is very common in real-world machine-lear...
research
11/03/2017

BoostClean: Automated Error Detection and Repair for Machine Learning

Predictive models based on machine learning can be highly sensitive to d...
research
06/01/2021

Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

In learning with noisy labels, the sample selection approach is very pop...
research
11/17/2021

A label efficient two-sample test

Two-sample tests evaluate whether two samples are realizations of the sa...

Please sign up or login with your details

Forgot password? Click here to reset