Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

01/02/2023
by   Yikai Wang, et al.
0

A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.

READ FULL TEXT
research
03/15/2022

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels

Noisy training set usually leads to the degradation of generalization an...
research
06/08/2023

A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

Learning with noisy labels is an important topic for scalable training i...
research
11/20/2022

SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels

Annotating the dataset with high-quality labels is crucial for performan...
research
09/02/2023

Regularly Truncated M-estimators for Learning with Noisy Labels

The sample selection approach is very popular in learning with noisy lab...
research
08/26/2023

Late Stopping: Avoiding Confidently Learning from Mislabeled Examples

Sample selection is a prevalent method in learning with noisy labels, wh...
research
09/14/2022

Few Clean Instances Help Denoising Distant Supervision

Existing distantly supervised relation extractors usually rely on noisy ...
research
06/30/2020

Early-Learning Regularization Prevents Memorization of Noisy Labels

We propose a novel framework to perform classification via deep learning...

Please sign up or login with your details

Forgot password? Click here to reset