Robust Active Distillation

by   Cenk Baykal, et al.

Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the teacher additionally constrains the number of soft-labels that can be queried due to prohibitive computational and/or financial costs. The difficulty in achieving simultaneous efficiency (i.e., minimizing soft-label queries) and robustness (i.e., avoiding student inaccuracies due to incorrect labels) hurts the widespread application of knowledge distillation to many modern tasks. In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. At the core of our work lies a game-theoretic formulation that explicitly considers the inherent trade-off between the informativeness and correctness of input instances. We establish bounds on the expected performance of our approach that hold even in worst-case distillation instances. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods.


page 1

page 2

page 3

page 4


SLaM: Student-Label Mixing for Semi-Supervised Knowledge Distillation

Semi-supervised knowledge distillation is a powerful training paradigm f...

TrustAL: Trustworthy Active Learning using Knowledge Distillation

Active learning can be defined as iterations of data labeling, model tra...

Faithful Knowledge Distillation

Knowledge distillation (KD) has received much attention due to its succe...

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer th...

KDCTime: Knowledge Distillation with Calibration on InceptionTime for Time-series Classification

Time-series classification approaches based on deep neural networks are ...

Please sign up or login with your details

Forgot password? Click here to reset