2D Object Detection with Transformers: A Review

by   Tahira Shehzadi, et al.

Astounding performance of Transformers in natural language processing (NLP) has delighted researchers to explore their utilization in computer vision tasks. Like other computer vision tasks, DEtection TRansformer (DETR) introduces transformers for object detection tasks by considering the detection as a set prediction problem without needing proposal generation and post-processing steps. It is a state-of-the-art (SOTA) method for object detection, particularly in scenarios where the number of objects in an image is relatively small. Despite the success of DETR, it suffers from slow training convergence and performance drops for small objects. Therefore, many improvements are proposed to address these issues, leading to immense refinement in DETR. Since 2020, transformer-based object detection has attracted increasing interest and demonstrated impressive performance. Although numerous surveys have been conducted on transformers in vision in general, a review regarding advancements made in 2D object detection using transformers is still missing. This paper gives a detailed review of twenty-one papers about recent developments in DETR. We begin with the basic modules of Transformers, such as self-attention, object queries and input features encoding. Then, we cover the latest advancements in DETR, including backbone modification, query design and attention refinement. We also compare all detection transformers in terms of performance and network design. We hope this study will increase the researcher's interest in solving existing challenges towards applying transformers in the object detection domain. Researchers can follow newer improvements in detection transformers on this webpage available at: https://github.com/mindgarage-shan/trans_object_detection_survey


page 1

page 2

page 3

page 4


Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing....

k-means Mask Transformer

The rise of transformers in vision tasks not only advances network backb...

Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

Transformers have rapidly gained popularity in computer vision, especial...

Visual Composite Set Detection Using Part-and-Sum Transformers

Computer vision applications such as visual relationship detection and h...

Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets

Vision Transformers have attracted a lot of attention recently since the...

CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

Due to the success of Bidirectional Encoder Representations from Transfo...

Transformer Assisted Convolutional Network for Cell Instance Segmentation

Region proposal based methods like R-CNN and Faster R-CNN models have pr...

Please sign up or login with your details

Forgot password? Click here to reset