Multi-Label Knowledge Distillation

by   Penghui Yang, et al.
Nanjing University of Aeronautics and Astronautics
The University of Tokyo

Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at:


page 1

page 5

page 8

page 14

page 15


Knowledge Distillation from Single to Multi Labels: an Empirical Study

Knowledge distillation (KD) has been extensively studied in single-label...

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Knowledge distillation is a method of transferring the knowledge from a ...

Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer

Real-world recognition system often encounters a plenty of unseen labels...

Multi-Label Classification Neural Networks with Hard Logical Constraints

Multi-label classification (MC) is a standard machine learning problem i...

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...

Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

Online knowledge distillation conducts knowledge transfer among all stud...

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Knowledge distillation (KD) has recently emerged as an efficacious schem...

Please sign up or login with your details

Forgot password? Click here to reset