Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

09/04/2019
by   Sungho Shin, et al.
0

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces coefficient during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7 and CIFAR-100 data sets, respectively.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset