Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention

08/10/2023
by   Liang Shang, et al.
0

Convolutional neural networks (CNNs) and vision transformers (ViTs) have achieved remarkable success in various vision tasks. However, many architectures do not consider interactions between feature maps from different stages and scales, which may limit their performance. In this work, we propose a simple add-on attention module to overcome these limitations via multi-stage and cross-scale interactions. Specifically, the proposed Multi-Stage Cross-Scale Attention (MSCSA) module takes feature maps from different stages to enable multi-stage interactions and achieves cross-scale interactions by computing self-attention at different scales based on the multi-stage feature maps. Our experiments on several downstream tasks show that MSCSA provides a significant performance boost with modest additional FLOPs and runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Sequential Cross Attention Based Multi-task Learning

In multi-task learning (MTL) for visual scene understanding, it is cruci...
research
08/03/2018

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification

Local features at neighboring spatial positions in feature maps have hig...
research
04/04/2022

MaxViT: Multi-Axis Vision Transformer

Transformers have recently gained significant attention in the computer ...
research
10/16/2022

Scratching Visual Transformer's Back with Uniform Attention

The favorable performance of Vision Transformers (ViTs) is often attribu...
research
08/18/2019

Investigating Convolutional Neural Networks using Spatial Orderness

Convolutional Neural Networks (CNN) have been pivotal to the success of ...
research
12/19/2021

Parallel Multi-Scale Networks with Deep Supervision for Hand Keypoint Detection

Keypoint detection plays an important role in a wide range of applicatio...
research
06/24/2020

Feature-dependent Cross-Connections in Multi-Path Neural Networks

Learning a particular task from a dataset, samples in which originate fr...

Please sign up or login with your details

Forgot password? Click here to reset