Self-discipline on multiple channels

by   Jiutian Zhao, et al.
Wuhan University of Technology

Self-distillation relies on its own information to improve the generalization ability of the model and has a bright future. Existing self-distillation methods either require additional models, model modification, or batch size expansion for training, which increases the difficulty of use, memory consumption, and computational cost. This paper developed Self-discipline on multiple channels(SMC), which combines consistency regularization with self-distillation using the concept of multiple channels. Conceptually, SMC consists of two steps: 1) each channel data is simultaneously passed through the model to obtain its corresponding soft label, and 2) the soft label saved in the previous step is read together with the soft label obtained from the current channel data through the model to calculate the loss function. SMC uses consistent regularization and self-distillation to improve the generalization ability of the model and the robustness of the model to noisy labels. We named the SMC containing only two channels as SMC-2. Comparative experimental results on both datasets show that SMC-2 outperforms Label Smoothing Regularizaion and Self-distillation From The Last Mini-batch on all models, and outperforms the state-of-the-art Sharpness-Aware Minimization method on 83 models.Compatibility of SMC-2 and data augmentation experimental results show that using both SMC-2 and data augmentation improves the generalization ability of the model between 0.28 Ultimately, the results of the label noise interference experiments show that SMC-2 curbs the tendency that the model's generalization ability decreases in the late training period due to the interference of label noise. The code is available at


page 4

page 7

page 8


Self-Distillation from the Last Mini-Batch for Consistency Regularization

Knowledge distillation (KD) shows a bright promise as a powerful regular...

Self-Feature Regularization: Self-Feature Distillation Without Teacher Models

Knowledge distillation is the process of transferring the knowledge from...

Efficient One Pass Self-distillation with Zipf's Label Smoothing

Self-distillation exploits non-uniform soft supervision from itself duri...

Generalized Lightness Adaptation with Channel Selective Normalization

Lightness adaptation is vital to the success of image processing to avoi...

Self-Augmentation: Generalizing Deep Networks to Unseen Classes for Few-Shot Learning

Few-shot learning aims to classify unseen classes with a few training ex...

SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization

Methods for improving deep neural network training times and model gener...

Generalization by Recognizing Confusion

A recently-proposed technique called self-adaptive training augments mod...

Please sign up or login with your details

Forgot password? Click here to reset