Reconstruction-guided attention improves the robustness and shape processing of neural networks

by   Seoyoung Ahn, et al.

Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and adversarially trained networks. Our model is particularly robust to blur, noise, and occlusion corruptions, where shape perception plays an important role. Ablation studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model's inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. We also observed that the model sometimes hallucinates a non-existing pattern out of noise, leading to highly interpretable human-like errors. Our study shows that modeling reconstruction-based feedback endows AI systems with a powerful attention mechanism, which can help us understand the role of generating perception in human visual processing.


page 3

page 10


Recurrent Soft Attention Model for Common Object Recognition

We propose the Recurrent Soft Attention Model, which integrates the visu...

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

The visual system processes a scene using a sequence of selective glimps...

Humans and deep networks largely agree on which kinds of variation make object recognition harder

View-invariant object recognition is a challenging problem, which has at...

TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion

Despite deep convolutional neural networks' great success in object clas...

Learning Robust Object Recognition Using Composed Scenes from Generative Models

Recurrent feedback connections in the mammalian visual system have been ...

Human peripheral blur is optimal for object recognition

Our eyes sample a disproportionately large amount of information at the ...

The functional role of cue-driven feature-based feedback in object recognition

Visual object recognition is not a trivial task, especially when the obj...

Please sign up or login with your details

Forgot password? Click here to reset