Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

12/15/2022
by   Loic Themyr, et al.
0

Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset.

READ FULL TEXT

page 2

page 8

research
06/17/2021

XCiT: Cross-Covariance Image Transformers

Following their success in natural language processing, transformers hav...
research
10/11/2022

Memory transformers for full context and high-resolution 3D Medical Segmentation

Transformer models achieve state-of-the-art results for image segmentati...
research
12/28/2022

Representation Separation for Semantic Segmentation with Vision Transformers

Vision transformers (ViTs) encoding an image as a sequence of patches br...
research
03/27/2023

Learning Expressive Prompting With Residuals for Vision Transformers

Prompt learning is an efficient approach to adapt transformers by insert...
research
05/31/2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

We present SegFormer, a simple, efficient yet powerful semantic segmenta...
research
12/29/2022

AttEntropy: Segmenting Unknown Objects in Complex Scenes using the Spatial Attention Entropy of Semantic Segmentation Transformers

Vision transformers have emerged as powerful tools for many computer vis...
research
10/21/2022

High-Fidelity Visual Structural Inspections through Transformers and Learnable Resizers

Visual inspection is the predominant technique for evaluating the condit...

Please sign up or login with your details

Forgot password? Click here to reset