Global-and-local attention networks for visual recognition

05/22/2018
by   Drew Linsley, et al.
0

State-of-the-art deep convolutional networks (DCNs) such as squeeze-and- excitation (SE) residual networks implement a form of attention, also known as contextual guidance, which is derived from global image features. Here, we explore a complementary form of attention, known as visual saliency, which is derived from local image features. We extend the SE module with a novel global-and-local attention (GALA) module which combines both forms of attention -- resulting in state-of-the-art accuracy on ILSVRC. We further describe ClickMe.ai, a large-scale online experiment designed for human participants to identify diagnostic image regions to co-train a GALA network. Adding humans-in-the-loop is shown to significantly improve network accuracy, while also yielding visual features that are more interpretable and more similar to those used by human observers.

READ FULL TEXT
research
01/10/2017

What are the visual features underlying human versus machine vision?

Although Deep Convolutional Networks (DCNs) are approaching the accuracy...
research
03/29/2021

CNN-based search model underestimates attention guidance by simple visual features

Recently, Zhang et al. (2018) proposed an interesting model of attention...
research
10/22/2021

GCCN: Global Context Convolutional Network

In this paper, we propose Global Context Convolutional Network (GCCN) fo...
research
01/16/2022

Global Regular Network for Writer Identification

Writer identification has practical applications for forgery detection a...
research
11/20/2018

Fading of collective attention shapes the evolution of linguistic variants

Language change involves the competition between alternative linguistic ...
research
04/13/2022

Deep Learning Model with GA based Feature Selection and Context Integration

Deep learning models have been very successful in computer vision and im...
research
07/05/2021

Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context

In this paper we investigate the amount of spatial context required for ...

Please sign up or login with your details

Forgot password? Click here to reset