Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

06/13/2022
by   Zengyu Qiu, et al.
37

Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the 'prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of the teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO). The results demonstrate the superiority of our method in performance under varying settings. More importantly, our DPK makes the performance of the student model is positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. Our codes will be publicly available for the reproducibility.

READ FULL TEXT

page 9

page 15

page 18

page 19

page 20

research
03/31/2021

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher n...
research
09/27/2022

PROD: Progressive Distillation for Dense Retrieval

Knowledge distillation is an effective way to transfer knowledge from a ...
research
11/13/2019

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Despite the recent works on knowledge distillation (KD) have achieved a ...
research
03/03/2015

Robustly Leveraging Prior Knowledge in Text Classification

Prior knowledge has been shown very useful to address many natural langu...
research
06/17/2021

Dynamic Knowledge Distillation with A Single Stream Structure for RGB-DSalient Object Detection

RGB-D salient object detection(SOD) demonstrates its superiority on dete...
research
11/25/2022

Privileged Prior Information Distillation for Image Matting

Performance of trimap-free image matting methods is limited when trying ...
research
08/23/2019

Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation

Recent developments in NLP have been accompanied by large, expensive mod...

Please sign up or login with your details

Forgot password? Click here to reset