Joint Network based Attention for Action Recognition

11/16/2016
by   Yemin Shi, et al.
0

By extracting spatial and temporal characteristics in one network, the two-stream ConvNets can achieve the state-of-the-art performance in action recognition. However, such a framework typically suffers from the separately processing of spatial and temporal information between the two standalone streams and is hard to capture long-term temporal dependence of an action. More importantly, it is incapable of finding the salient portions of an action, say, the frames that are the most discriminative to identify the action. To address these problems, a joint network based attention (JNA) is proposed in this study. We find that the fully-connected fusion, branch selection and spatial attention mechanism are totally infeasible for action recognition. Thus in our joint network, the spatial and temporal branches share some information during the training stage. We also introduce an attention mechanism on the temporal domain to capture the long-term dependence meanwhile finding the salient portions. Extensive experiments are conducted on two benchmark datasets, UCF101 and HMDB51. Experimental results show that our method can improve the action recognition performance significantly and achieves the state-of-the-art results on both datasets.

READ FULL TEXT
research
09/10/2016

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN

Learning the spatial-temporal representation of motion information is cr...
research
02/17/2023

Dynamic Spatial-temporal Hypergraph Convolutional Network for Skeleton-based Action Recognition

Skeleton-based action recognition relies on the extraction of spatial-te...
research
01/27/2023

Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal Networks

For pursuing accurate skeleton-based action recognition, most prior meth...
research
12/20/2017

Human Action Recognition: Pose-based Attention draws focus to Hands

We propose a new spatio-temporal attention based mechanism for human act...
research
04/04/2017

Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

In this paper we address the problem of human action recognition from vi...
research
02/08/2020

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Egocentric video recognition is a natural testbed for diverse interactio...
research
06/01/2020

Temporal Aggregate Representations for Long Term Video Understanding

Future prediction requires reasoning from current and past observations ...

Please sign up or login with your details

Forgot password? Click here to reset