BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

04/04/2022
by   Zhi Hou, et al.
0

Attention mechanisms have been very popular in deep neural networks, where the Transformer architecture has achieved great success in not only natural language processing but also visual recognition applications. Recently, a new Transformer module, applying on batch dimension rather than spatial/channel dimension, i.e., BatchFormer [18], has been introduced to explore sample relationships for overcoming data scarcity challenges. However, it only works with image-level representations for classification. In this paper, we devise a more general batch Transformer module, BatchFormerV2, which further enables exploring sample relationships for dense representation learning. Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i.e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing. Therefore, the proposed method is a plug-and-play module and can be easily integrated into different vision Transformers without any extra inference cost. Without bells and whistles, we show the effectiveness of the proposed method for a variety of popular visual recognition tasks, including image classification and two important dense prediction tasks: object detection and panoptic segmentation. Particularly, BatchFormerV2 consistently improves current DETR-based detection methods (e.g., DETR, Deformable-DETR, Conditional DETR, and SMCA) by over 1.3 made publicly available.

READ FULL TEXT

page 11

page 12

page 22

page 24

page 25

research
03/03/2022

BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Despite the success of deep neural networks, there are still many challe...
research
05/29/2021

Less is More: Pay Less Attention in Vision Transformers

Transformers have become one of the dominant architectures in deep learn...
research
11/19/2022

Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach

Exploring sample relationships within each mini-batch has shown great po...
research
07/30/2021

DPT: Deformable Patch-based Transformer for Visual Recognition

Transformer has achieved great success in computer vision, while how to ...
research
10/11/2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve ...
research
07/19/2022

Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective

Visual representation learning is the key of solving various vision prob...
research
06/07/2022

Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning

Unsupervised representation learning methods like SwAV are proved to be ...

Please sign up or login with your details

Forgot password? Click here to reset