DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

by   Jiahao Chang, et al.

Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations, but the huge model can be computationally time-consuming and difficult to be deployed in real-world applications. To tackle this problem, knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework. Different from the traditional CNN detectors, where the distillation targets can be naturally aligned through the feature map, DETR regards object detection as a set prediction problem, leading to an unclear relationship between teacher and student during distillation. In this paper, we propose DETRDistill, a novel knowledge distillation dedicated to DETR-families. We first explore a sparse matching paradigm with progressive stage-by-stage instance distillation. Considering the diverse attention mechanisms adopted in different DETRs, we propose attention-agnostic feature distillation module to overcome the ineffectiveness of conventional feature imitation. Finally, to fully leverage the intermediate products from the teacher, we introduce teacher-assisted assignment distillation, which uses the teacher's object queries and assignment results for a group with additional guidance. Extensive experiments demonstrate that our distillation method achieves significant improvement on various competitive DETR approaches, without introducing extra consumption in the inference phase. To the best of our knowledge, this is the first systematic study to explore a general distillation method for DETR-style detectors.


page 2

page 5

page 7


Label Assignment Distillation for Object Detection

Knowledge distillation methods are proved to be promising in improving t...

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

DETR is a novel end-to-end transformer architecture object detector, whi...

Hands-on Guidance for Distilling Object Detectors

Knowledge distillation can lead to deploy-friendly networks against the ...

DRKF: Distilled Rotated Kernel Fusion for Efficiently Boosting Rotation Invariance in Image Matching

Most existing learning-based image matching pipelines are designed for b...

Structural Knowledge Distillation for Object Detection

Knowledge Distillation (KD) is a well-known training paradigm in deep ne...

D^3ETR: Decoder Distillation for Detection Transformer

While various knowledge distillation (KD) methods in CNN-based detectors...

Debiased Distillation by Transplanting the Last Layer

Deep models are susceptible to learning spurious correlations, even duri...

Please sign up or login with your details

Forgot password? Click here to reset