Deep Attentional Structured Representation Learning for Visual Recognition

05/14/2018
by   Krishna Kanth Nakka, et al.
2

Structured representations, such as Bags of Words, VLAD and Fisher Vectors, have proven highly effective to tackle complex visual recognition tasks. As such, they have recently been incorporated into deep architectures. However, while effective, the resulting deep structured representation learning strategies typically aggregate local features from the entire image, ignoring the fact that, in complex recognition tasks, some regions provide much more discriminative information than others. In this paper, we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process. Our framework learns to predict jointly the image class label and an attention map in an end-to-end fashion and without any other supervision than the target label. As evidenced by our experiments, this consistently outperforms attention-less structured representation learning and yields state-of-the-art results on standard scene recognition and fine-grained categorization benchmarks.

READ FULL TEXT

page 8

page 9

page 15

page 16

page 17

page 18

page 19

research
03/05/2021

Variational Structured Attention Networks for Deep Visual Representation Learning

Convolutional neural networks have enabled major progress in addressing ...
research
05/22/2019

AttentionRNN: A Structured Spatial Attention Mechanism

Visual attention mechanisms have proven to be integrally important const...
research
08/25/2020

Discriminability Distillation in Group Representation Learning

Learning group representation is a commonly concerned issue in tasks whe...
research
07/09/2022

Learning Structured Representations of Visual Scenes

As the intermediate-level representations bridging the two levels, struc...
research
09/07/2015

An end-to-end generative framework for video segmentation and recognition

We describe an end-to-end generative approach for the segmentation and r...
research
07/07/2020

Structured (De)composable Representations Trained with Neural Networks

The paper proposes a novel technique for representing templates and inst...
research
09/08/2015

Deep Attributes from Context-Aware Regional Neural Codes

Recently, many researches employ middle-layer output of convolutional ne...

Please sign up or login with your details

Forgot password? Click here to reset