Highlight Every Step: Knowledge Distillation via Collaborative Teaching

07/23/2019
by   Haoran Zhao, et al.
7

High storage and computational costs obstruct deep neural networks to be deployed on resource-constrained devices. Knowledge distillation aims to train a compact student network by transferring knowledge from a larger pre-trained teacher model. However, most existing methods on knowledge distillation ignore the valuable information among training process associated with training results. In this paper, we provide a new Collaborative Teaching Knowledge Distillation (CTKD) strategy which employs two special teachers. Specifically, one teacher trained from scratch (i.e., scratch teacher) assists the student step by step using its temporary outputs. It forces the student to approach the optimal path towards the final logits with high accuracy. The other pre-trained teacher (i.e., expert teacher) guides the student to focus on a critical region which is more useful for the task. The combination of the knowledge from two special teachers can significantly improve the performance of the student network in knowledge distillation. The results of experiments on CIFAR-10, CIFAR-100, SVHN and Tiny ImageNet datasets verify that the proposed knowledge distillation method is efficient and achieves state-of-the-art performance.

READ FULL TEXT

page 1

page 4

research
02/09/2019

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

Despite the fact that deep neural networks are powerful models and achie...
research
09/26/2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

Knowledge distillation field delicately designs various types of knowled...
research
10/26/2020

Activation Map Adaptation for Effective Knowledge Distillation

Model compression becomes a recent trend due to the requirement of deplo...
research
05/17/2019

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Convolutional neural networks have been widely deployed in various appli...
research
03/07/2023

PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation

Fall accidents are critical issues in an aging and aged society. Recentl...
research
08/12/2021

Learning from Matured Dumb Teacher for Fine Generalization

The flexibility of decision boundaries in neural networks that are ungui...
research
01/31/2022

Deep-Disaster: Unsupervised Disaster Detection and Localization Using Visual Data

Social media plays a significant role in sharing essential information, ...

Please sign up or login with your details

Forgot password? Click here to reset