Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images

by   Yan Zhang, et al.

Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive transformer grows quadratically with the increase in image size, which prevents transformer-based models from VHR RS image (500x500 pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and therefore propose an efficient complex self-attention (CSA) mechanism. Benefiting from the conjugated symmetric property of DFT, CSA is capable to model the high-order contextual information with less than half computations of naive SA. To overcome the gradient explosion in Fourier complex field, we replace the Softmax function with the carefully designed Logmax function to normalize the attention map of CSA and stabilize the gradient propagation. By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images following the hierarchical manners. Universal experiments conducted on commonly used RS classification data sets demonstrate the effectiveness and efficiency of FCT, especially on very high-resolution RS images.


page 1

page 4

page 6

page 10


MMFormer: Multimodal Transformer Using Multiscale Self-Attention for Remote Sensing Image Classification

To benefit the complementary information between heterogeneous data, we ...

Lightweight Structure-aware Transformer Network for VHR Remote Sensing Image Change Detection

Popular Transformer networks have been successfully applied to remote se...

A Novel Multi-scale Attention Feature Extraction Block for Aerial Remote Sensing Image Classification

Classification of very high-resolution (VHR) aerial remote sensing (RS) ...

RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows

Recently, Transformers have shown promising performance in various visio...

Fourier Image Transformer

Transformer architectures show spectacular performance on NLP tasks and ...

RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework

In recent years, remote sensing (RS) vision foundation models such as Ri...

An improved tile-based scalable distributed management model of massive high-resolution satellite images

The amount of remote sensing (RS) data has increased at an unexpected sc...

Please sign up or login with your details

Forgot password? Click here to reset