RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

10/13/2022
by   Jian Wang, et al.
0

Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation, which achieves better trade-off between performance and efficiency than CNN-based models. To achieve high inference efficiency on GPU-like devices, our RTFormer leverages GPU-Friendly Attention with linear complexity and discards the multi-head mechanism. Besides, we find that cross-resolution attention is more efficient to gather global context information for high-resolution branch by spreading the high level knowledge learned from low-resolution branch. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer, it achieves state-of-the-art on Cityscapes, CamVid and COCOStuff, and shows promising results on ADE20K. Code is available at PaddleSeg: https://github.com/PaddlePaddle/PaddleSeg.

READ FULL TEXT

page 8

page 9

research
07/07/2020

Real-time Semantic Segmentation with Fast Attention

In deep CNN based models for semantic segmentation, high accuracy relies...
research
06/09/2022

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer

Learning Bird's Eye View (BEV) representation from surrounding-view came...
research
02/21/2023

Lightweight Real-time Semantic Segmentation Network with Efficient Transformer and CNN

In the past decade, convolutional neural networks (CNNs) have shown prom...
research
10/31/2021

DRBANET: A Lightweight Dual-Resolution Network for Semantic Segmentation with Boundary Auxiliary

Due to the powerful ability to encode image details and semantics, many ...
research
04/09/2022

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

The dominant CNN-based methods for cross-view image geo-localization rel...
research
04/20/2022

PP-Matting: High-Accuracy Natural Image Matting

Natural image matting is a fundamental and challenging computer vision t...
research
05/18/2023

Improving Toponym Resolution with Better Candidate Generation, Transformer-based Reranking, and Two-Stage Resolution

Geocoding is the task of converting location mentions in text into struc...

Please sign up or login with your details

Forgot password? Click here to reset