Two-Stream Transformer Architecture for Long Video Understanding

08/02/2022
by   Edward Fish, et al.
0

Pure vision transformer architectures are highly effective for short video classification and action recognition tasks. However, due to the quadratic complexity of self attention and lack of inductive bias, transformers are resource intensive and suffer from data inefficiencies. Long form video understanding tasks amplify data and memory efficiency problems in transformers making current approaches unfeasible to implement on data or memory restricted domains. This paper introduces an efficient Spatio-Temporal Attention Network (STAN) which uses a two-stream transformer architecture to model dependencies between static image features and temporal contextual features. Our proposed approach can classify videos up to two minutes in length on a single GPU, is data efficient, and achieves SOTA performance on several long video understanding tasks.

READ FULL TEXT

page 1

page 2

page 4

page 9

research
07/21/2022

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

The task of action detection aims at deducing both the action category a...
research
06/06/2019

Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers

Video Captioning and Summarization have become very popular in the recen...
research
07/22/2022

Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

We implemented Video Swin Transformer as a base architecture for the tas...
research
09/29/2020

Knowledge Fusion Transformers for Video Action Recognition

We introduce Knowledge Fusion Transformers for video action classificati...
research
04/04/2022

Long Movie Clip Classification with State-Space Video Models

Most modern video recognition models are designed to operate on short vi...
research
05/30/2022

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

Recurrent neural networks have a strong inductive bias towards learning ...

Please sign up or login with your details

Forgot password? Click here to reset