DeepAI AI Chat
Log In Sign Up

Efficient Hybrid Transformer: Learning Global-local Context for Urban Sence Segmentation

by   Libo Wang, et al.

Semantic segmentation of fine-resolution urban scene images plays a vital role in extensive practical applications, such as land cover mapping, urban change detection, environmental protection and economic assessment. Driven by rapid developments in deep learning technologies, convolutional neural networks (CNNs) have dominated the semantic segmentation task for many years. Convolutional neural networks adopt hierarchical feature representation and have strong local context extraction. However, the local property of the convolution layer limits the network from capturing global information that is crucial for improving fine-resolution image segmentation. Recently, Transformer comprise a hot topic in the computer vision domain. Vision Transformer demonstrates the great capability of global information modelling, boosting many vision tasks, such as image classification, object detection and especially semantic segmentation. In this paper, we propose an efficient hybrid Transformer (EHT) for semantic segmentation of urban scene images. EHT takes advantage of CNNs and Transformer, learning global-local context to strengthen the feature representation. Extensive experiments demonstrate that EHT has higher efficiency with competitive accuracy compared with state-of-the-art benchmark methods. Specifically, the proposed EHT achieves a 67.0 UAVid test set and outperforms other lightweight models significantly. The code will be available soon.


BuildFormer: Automatic building extraction with vision transformer

Building extraction from fine-resolution remote sensing images plays a v...

SSformer: A Lightweight Transformer for Semantic Segmentation

It is well believed that Transformer performs better in semantic segment...

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Domain-generalized urban-scene semantic segmentation (USSS) aims to lear...

Towards Efficient Scene Understanding via Squeeze Reasoning

Graph-based convolutional model such as non-local block has shown to be ...

Self corrective Perturbations for Semantic Segmentation and Classification

Convolutional Neural Networks have been a subject of great importance ov...

Revisiting Convolutional Neural Networks for Urban Flow Analytics

Convolutional Neural Networks (CNNs) have been widely adopted in raster-...

Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective

Visual representation learning is the key of solving various vision prob...