Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

by   Ganlong Zhao, et al.

Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the same noise model, and are independent of features. While in practice, the real-world noise patterns are usually more fine-grained as instance-dependent ones, which poses a big challenge, especially in the presence of inter-class imbalance. In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge. First, we employ a class-level feature clustering procedure for the early identification of clean samples that are near the class-wise prediction centers. Notably, we address the class imbalance problem by aggregating rare classes according to their prediction entropy. Second, for the remaining clean samples that are close to the ground truth class boundary (usually mixed with the samples with instance-dependent noises), we propose a novel consistency-based classification method that identifies them using the consistency of two classifier heads: the higher the consistency, the larger the probability that a sample is clean. Extensive experiments on several challenging benchmarks demonstrate the superior performance of our method against the state-of-the-art.


page 1

page 2

page 3

page 4


Combating Noisy-Labeled and Imbalanced Data by Two Stage Bi-Dimensional Sample Selection

Robust learning on noisy-labeled data has been an important task in real...

Bootstrapping the Relationship Between Images and Their Clean and Noisy Labels

Many state-of-the-art noisy-label learning methods rely on learning mech...

Learning with Noisy Labels over Imbalanced Subpopulations

Learning with Noisy Labels (LNL) has attracted significant attention fro...

Over-training with Mixup May Hurt Generalization

Mixup, which creates synthetic training instances by linearly interpolat...

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Since training a large-scale backdoored model from scratch requires a la...

NGC: A Unified Framework for Learning with Open-World Noisy Data

The existence of noisy data is prevalent in both the training and testin...

Leaf Cultivar Identification via Prototype-enhanced Learning

Plant leaf identification is crucial for biodiversity protection and con...

Please sign up or login with your details

Forgot password? Click here to reset