HRViT: Multi-Scale High-Resolution Vision Transformer

11/01/2021
by   Jiaqi Gu, et al.
0

Vision transformers (ViTs) have attracted much attention for their superior performance on computer vision tasks. To address their limitations of single-scale low-resolution representations, prior work adapts ViTs to high-resolution dense prediction tasks with hierarchical architectures to generate pyramid features. However, multi-scale representation learning is still under-explored on ViTs, given their classification-like sequential topology. To enhance ViTs with more capability to learn semantically-rich and spatially-precise multi-scale representations, in this work, we present an efficient integration of high-resolution multi-branch architectures with vision transformers, dubbed HRViT, pushing the Pareto front of dense prediction tasks to a new level. We explore heterogeneous branch design, reduce the redundancy in linear layers, and augment the model nonlinearity to balance the model performance and hardware efficiency. The proposed HRViT achieves 50.20 ADE20K and 83.16 surpassing state-of-the-art MiT and CSWin with an average of +1.78 mIoU improvement, 28 the potential of HRViT as strong vision backbones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Grafting Vision Transformers

Vision Transformers (ViTs) have recently become the state-of-the-art acr...
research
08/07/2023

Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection

Salient Object Detection (SOD) aims to identify and segment the most con...
research
03/29/2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

This paper presents a new Vision Transformer (ViT) architecture Multi-Sc...
research
04/22/2021

Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image rec...
research
03/15/2022

HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction

In accelerated MRI reconstruction, the anatomy of a patient is recovered...
research
07/05/2023

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnost...
research
06/28/2022

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

This work introduces the RevSilo, the first reversible module for bidire...

Please sign up or login with your details

Forgot password? Click here to reset