Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge <cit.>, a DETR variant DINO <cit.>, and an efficient DETR training method Group DETR <cit.>. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves 64.5 mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco
READ FULL TEXT 
  
  
     share
 share