Differences Between Hard and Noisy-labeled Samples: An Empirical Study

by   Mahsa Forouzesh, et al.

Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.


Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

Imperfect labels are ubiquitous in real-world datasets and seriously har...

Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise

Deep neural networks have proven to be highly effective when large amoun...

Friends and Foes in Learning from Noisy Labels

Learning from examples with noisy labels has attracted increasing attent...

Deep k-NN for Noisy Labels

Modern machine learning models are often trained on examples with noisy ...

LNL+K: Learning with Noisy Labels and Noise Source Distribution Knowledge

Learning with noisy labels (LNL) is challenging as the model tends to me...

Few-shot Learning with Noisy Labels

Few-shot learning (FSL) methods typically assume clean support sets with...

Prime Sample Attention in Object Detection

It is a common paradigm in object detection frameworks to treat all samp...

Please sign up or login with your details

Forgot password? Click here to reset