Student-friendly Knowledge Distillation

05/18/2023
by   Mengyang Yuan, et al.
0

In knowledge distillation, the knowledge from the teacher model is often too complex for the student model to thoroughly process. However, good teachers in real life always simplify complex material before teaching it to students. Inspired by this fact, we propose student-friendly knowledge distillation (SKD) to simplify teacher output into new knowledge representations, which makes the learning of the student model easier and more effective. SKD contains a softening processing and a learning simplifier. First, the softening processing uses the temperature hyperparameter to soften the output logits of the teacher model, which simplifies the output to some extent and makes it easier for the learning simplifier to process. The learning simplifier utilizes the attention mechanism to further simplify the knowledge of the teacher model and is jointly trained with the student model using the distillation loss, which means that the process of simplification is correlated with the training objective of the student model and ensures that the simplified new teacher knowledge representation is more suitable for the specific student model. Furthermore, since SKD does not change the form of the distillation loss, it can be easily combined with other distillation methods that are based on the logits or features of intermediate layers to enhance its effectiveness. Therefore, SKD has wide applicability. The experimental results on the CIFAR-100 and ImageNet datasets show that our method achieves state-of-the-art performance while maintaining high training efficiency.

READ FULL TEXT

page 2

page 15

research
02/12/2021

Learning Student-Friendly Teacher Networks for Knowledge Distillation

We propose a novel knowledge distillation approach to facilitate the tra...
research
04/10/2019

Knowledge Squeezed Adversarial Network Compression

Deep network compression has been achieved notable progress via knowledg...
research
12/12/2022

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Knowledge Distillation (KD) has been extensively used for natural langua...
research
12/11/2022

Learning What You Should Learn

In real teaching scenarios, an excellent teacher always teaches what he ...
research
11/23/2020

Generative Adversarial Simulator

Knowledge distillation between machine learning models has opened many n...
research
12/07/2020

Model Compression Using Optimal Transport

Model compression methods are important to allow for easier deployment o...
research
03/09/2022

Efficient Sub-structured Knowledge Distillation

Structured prediction models aim at solving a type of problem where the ...

Please sign up or login with your details

Forgot password? Click here to reset