Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

07/21/2023
by   Jialiang Tang, et al.
0

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some privacy issues and data management considerations. Therefore, data-free knowledge distillation approaches proposed to collect training instances from the Internet. However, most of them have ignored the common distribution shift between the instances from original training data and webly collected data, affecting the reliability of the trained student network. To solve this problem, we propose a novel method dubbed “Knowledge Distillation between Different Distributions" (KD^3), which consists of three components. Specifically, we first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. Subsequently, we align both the weighted features and classifier parameters of the two networks for knowledge memorization. Meanwhile, we also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment, so that the student network can further learn a distribution-invariant representation. Intensive experiments on various benchmark datasets demonstrate that our proposed KD^3 can outperform the state-of-the-art data-free knowledge distillation approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2019

Zero-Shot Knowledge Distillation in Deep Networks

Knowledge distillation deals with the problem of training a smaller mode...
research
08/29/2022

How to Teach: Learning Data-Free Knowledge Distillation from Curriculum

Data-free knowledge distillation (DFKD) aims at training lightweight stu...
research
11/21/2019

Few Shot Network Compression via Cross Distillation

Model compression has been widely adopted to obtain light-weighted deep ...
research
09/15/2022

On-Device Domain Generalization

We present a systematic study of domain generalization (DG) for tiny neu...
research
02/08/2018

Imitation networks: Few-shot learning of neural networks from scratch

In this paper, we propose imitation networks, a simple but effective met...
research
05/08/2023

Web Content Filtering through knowledge distillation of Large Language Models

We introduce a state-of-the-art approach for URL categorization that lev...
research
12/12/2021

Up to 100x Faster Data-free Knowledge Distillation

Data-free knowledge distillation (DFKD) has recently been attracting inc...

Please sign up or login with your details

Forgot password? Click here to reset