Efficient Hybrid Transformer: Learning Global-local Context for Urban Sence Segmentation

09/18/2021
by   Libo Wang, et al.
0

Semantic segmentation of fine-resolution urban scene images plays a vital role in extensive practical applications, such as land cover mapping, urban change detection, environmental protection and economic assessment. Driven by rapid developments in deep learning technologies, convolutional neural networks (CNNs) have dominated the semantic segmentation task for many years. Convolutional neural networks adopt hierarchical feature representation and have strong local context extraction. However, the local property of the convolution layer limits the network from capturing global information that is crucial for improving fine-resolution image segmentation. Recently, Transformer comprise a hot topic in the computer vision domain. Vision Transformer demonstrates the great capability of global information modelling, boosting many vision tasks, such as image classification, object detection and especially semantic segmentation. In this paper, we propose an efficient hybrid Transformer (EHT) for semantic segmentation of urban scene images. EHT takes advantage of CNNs and Transformer, learning global-local context to strengthen the feature representation. Extensive experiments demonstrate that EHT has higher efficiency with competitive accuracy compared with state-of-the-art benchmark methods. Specifically, the proposed EHT achieves a 67.0 UAVid test set and outperforms other lightweight models significantly. The code will be available soon.

READ FULL TEXT
research
11/29/2021

BuildFormer: Automatic building extraction with vision transformer

Building extraction from fine-resolution remote sensing images plays a v...
research
08/03/2022

SSformer: A Lightweight Transformer for Semantic Segmentation

It is well believed that Transformer performs better in semantic segment...
research
07/01/2023

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Domain-generalized urban-scene semantic segmentation (USSS) aims to lear...
research
11/06/2020

Towards Efficient Scene Understanding via Squeeze Reasoning

Graph-based convolutional model such as non-local block has shown to be ...
research
03/23/2017

Self corrective Perturbations for Semantic Segmentation and Classification

Convolutional Neural Networks have been a subject of great importance ov...
research
02/28/2020

Revisiting Convolutional Neural Networks for Urban Flow Analytics

Convolutional Neural Networks (CNNs) have been widely adopted in raster-...
research
07/19/2022

Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective

Visual representation learning is the key of solving various vision prob...

Please sign up or login with your details

Forgot password? Click here to reset