Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

02/01/2023
by   Chenglong Wang, et al.
0

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use of them to train the student model. Our preliminary study shows that: (1) not all of the knowledge is necessary for learning a good student model, and (2) knowledge distillation can benefit from certain knowledge at different training steps. In response to these, we propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation. In addition, we offer a refinement of the training algorithm to ease the computational burden. Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

Gradient Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) is an effective framework to transfer knowle...
research
02/04/2022

Iterative Self Knowledge Distillation – From Pothole Classification to Fine-Grained and COVID Recognition

Pothole classification has become an important task for road inspection ...
research
10/15/2020

Spherical Knowledge Distillation

Knowledge distillation aims at obtaining a small but effective deep mode...
research
08/29/2021

Lipschitz Continuity Guided Knowledge Distillation

Knowledge distillation has become one of the most important model compre...
research
05/02/2020

Heterogeneous Knowledge Distillation using Information Flow Modeling

Knowledge Distillation (KD) methods are capable of transferring the know...
research
07/27/2023

f-Divergence Minimization for Sequence-Level Knowledge Distillation

Knowledge distillation (KD) is the process of transferring knowledge fro...
research
07/03/2023

Review helps learn better: Temporal Supervised Knowledge Distillation

Reviewing plays an important role when learning knowledge. The knowledge...

Please sign up or login with your details

Forgot password? Click here to reset