UAN: Unified Attention Network for Convolutional Neural Networks

by   Tony Joseph, et al.
University of Ontario Institute of Technology

We propose a new architecture that learns to attend to different Convolutional Neural Networks (CNN) layers (i.e., different levels of abstraction) and different spatial locations (i.e., specific layers within a given feature map) in a sequential manner to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) timestep, a CNN layer is selected and its output is processed by a spatial soft-attention mechanism. We refer to this architecture as the Unified Attention Network (UAN), since it combines the "what" and "where" aspects of attention, i.e., "what" level of abstraction to attend to, and "where" should the network look at. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) image-based camera pose and orientation regression and (ii) indoor scene classification. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7-Scene, and TUM-LSI datasets) and for scene classification (MIT-67 indoor dataset), and show that our method improves upon the results of previous methods. Empirically, we show that combining "what" and "where" aspects of attention improves network performance on both tasks.


page 1

page 6

page 7

page 14

page 15

page 16

page 17


DCANet: Learning Connected Attentions for Convolutional Neural Networks

While self-attention mechanism has shown promising results for many visi...

Adaptive Feature Abstraction for Translating Video to Text

Previous models for video captioning often use the output from a specifi...

An Enhanced Convolutional Neural Network in Side-Channel Attacks and Its Visualization

In recent years, the convolutional neural networks (CNNs) have received ...

Conditionally Learn to Pay Attention for Sequential Visual Task

Sequential visual task usually requires to pay attention to its current ...

Graph Attention Network for Camera Relocalization on Dynamic Scenes

We devise a graph attention network-based approach for learning a scene ...

Scene Classification in Indoor Environments for Robots using Context Based Word Embeddings

Scene Classification has been addressed with numerous techniques in comp...

NSANet: Noise Seeking Attention Network

LiDAR (Light Detection and Ranging) technology has remained popular in c...

Please sign up or login with your details

Forgot password? Click here to reset