Towards Comparable Knowledge Distillation in Semantic Image Segmentation

by   Onno Niemann, et al.

Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.


page 1

page 2

page 3

page 4


Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge ...

Weight Averaging Improves Knowledge Distillation under Domain Shift

Knowledge distillation (KD) is a powerful model compression technique br...

Normalized Feature Distillation for Semantic Segmentation

As a promising approach in model compression, knowledge distillation imp...

Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation

Existing knowledge distillation works for semantic segmentation mainly f...

DOT: A Distillation-Oriented Trainer

Knowledge distillation transfers knowledge from a large model to a small...

Why distillation helps: a statistical perspective

Knowledge distillation is a technique for improving the performance of a...

Knowledge Distillation for Efficient Sequences of Training Runs

In many practical scenarios – like hyperparameter search or continual re...

Please sign up or login with your details

Forgot password? Click here to reset