Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

04/01/2022
by   Samrudhdhi B Rangrej, et al.
10

Most hard attention models initially observe a complete scene to locate and sense informative glimpses, and predict class-label of a scene based on glimpses. However, in many applications (e.g., aerial imaging), observing an entire scene is not always feasible due to the limited time and resources available for acquisition. In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. We design our agent using DeiT-distilled and train it with a one-step actor-critic algorithm. Furthermore, to improve classification performance, we introduce a novel training objective, which enforces consistency between the class distribution predicted by a teacher model from a complete image and the class distribution predicted by our agent using glimpses. When the agent senses only 4 our training objective yields 3 datasets, respectively. Moreover, our agent outperforms previous state-of-the-art by observing nearly 27 ImageNet and fMoW.

READ FULL TEXT

page 7

page 13

page 14

research
10/24/2022

GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction

Many online action prediction models observe complete frames to locate a...
research
11/29/2018

Incremental Scene Synthesis

We present a method to incrementally generate complete 2D or 3D scenes w...
research
11/15/2021

A Probabilistic Hard Attention Model For Sequentially Observed Scenes

A visual hard attention model actively selects and observes a sequence o...
research
10/11/2021

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Multi-agent reinforcement learning (MARL) has attracted much research at...
research
04/10/2019

Actor-Critic Instance Segmentation

Most approaches to visual scene analysis have emphasised parallel proces...
research
05/27/2023

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

Computational Pathology (CPATH) systems have the potential to automate d...

Please sign up or login with your details

Forgot password? Click here to reset