Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

by   Libo Wang, et al.

The fully-convolutional network (FCN) with an encoder-decoder architecture has become the standard paradigm for semantic segmentation. The encoder-decoder architecture utilizes an encoder to capture multi-level feature maps, which are then incorporated into the final prediction by a decoder. As the context is critical for precise segmentation, tremendous effort has been made to extract such information in an intelligent manner, including employing dilated/atrous convolutions or inserting attention modules. However, the aforementioned endeavors are all based on the FCN architecture with ResNet backbone which cannot tackle the context issue from the root. By contrast, we introduce the Swin Transformer as the backbone to fully extract the context information and design a novel decoder named densely connected feature aggregation module (DCFAM) to restore the resolution and generate the segmentation map. The extensive experiments on two datasets demonstrate the effectiveness of the proposed scheme.


Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Most recent semantic segmentation methods adopt a fully-convolutional ne...

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Both performance and efficiency are important to semantic segmentation. ...

Encoded Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery

Fully Convolutional Network (FCN) has been widely used in recent work fo...

PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation

The lightweight MLP-based decoder has become increasingly promising for ...

Guided Upsampling Network for Real-Time Semantic Segmentation

Semantic segmentation architectures are mainly built upon an encoder-dec...

Deep Smoke Segmentation

Inspired by the recent success of fully convolutional networks (FCN) in ...

Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective

Visual representation learning is the key of solving various vision prob...

Please sign up or login with your details

Forgot password? Click here to reset