Information Bottleneck Approach to Spatial Attention Learning

by   Qiuxia Lai, et al.

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at


Attention Branch Network: Learning of Attention Mechanism for Visual Explanation

Visual explanation enables human to understand the decision making of De...

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-Boosting Attention Mechanism

The challenge of fine-grained visual recognition often lies in discoveri...

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

Most feedforward convolutional neural networks spend roughly the same ef...

Dynamic Computational Time for Visual Attention

We propose a dynamic computational time model to accelerate the average ...

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Attention mechanism has demonstrated great potential in fine-grained vis...

Learn To Pay Attention

We propose an end-to-end-trainable attention module for convolutional ne...

Dynamic Scene Deblurring Base on Continuous Cross-Layer Attention Transmission

The deep convolutional neural networks (CNNs) using attention mechanism ...

Please sign up or login with your details

Forgot password? Click here to reset