ResT V2: Simpler, Faster and Stronger

04/15/2022
by   Qing-Long Zhang, et al.
0

This paper proposes ResTv2, a simpler, faster, and stronger multi-scale vision Transformer for visual recognition. ResTv2 simplifies the EMSA structure in ResTv1 (i.e., eliminating the multi-head interaction part) and employs an upsample operation to reconstruct the lost medium- and high-frequency information caused by the downsampling operation. In addition, we explore different techniques for better apply ResTv2 backbones to downstream tasks. We found that although combining EMSAv2 and window attention can greatly reduce the theoretical matrix multiply FLOPs, it may significantly decrease the computation density, thus causing lower actual speed. We comprehensively validate ResTv2 on ImageNet classification, COCO detection, and ADE20K semantic segmentation. Experimental results show that the proposed ResTv2 can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of ResTv2 as solid backbones. The code and models will be made publicly available at <https://github.com/wofmanaf/ResT>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

ResT: An Efficient Transformer for Visual Recognition

This paper presents an efficient multi-scale vision Transformer, called ...
research
06/20/2022

Global Context Vision Transformers

We propose global context vision transformer (GC ViT), a novel architect...
research
03/15/2023

DeepMIM: Deep Supervision for Masked Image Modeling

Deep supervision, which involves extra supervisions to the intermediate ...
research
07/18/2021

AS-MLP: An Axial Shifted MLP Architecture for Vision

An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Di...
research
06/19/2022

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

Motivated by biological evolution, this paper explains the rationality o...
research
11/14/2022

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

We launch EVA, a vision-centric foundation model to explore the limits o...
research
07/17/2023

Scale-Aware Modulation Meet Transformer

This paper presents a new vision Transformer, Scale-Aware Modulation Tra...

Please sign up or login with your details

Forgot password? Click here to reset