Multi-Level Recurrent Residual Networks for Action Recognition

11/22/2017
by   Zhenxing Zheng, et al.
0

Most existing Convolutional Neural Networks(CNNs) used for action recognition are either difficult to optimize or underuse crucial temporal information. Inspired by the fact that LSTM consistently makes breakthrough in the task related to sequence, we propose a novel Multi-Level Recurrent Residual Networks(MRRN) model which incorporates three separate recognition streams. The proposed model could capture spatiotemporal information by employing ResNets to learn spatial representations from static frames and stacked SRUs to learn temporal dynamics. Three distinct-level models are fused by averaging their softmax scores to obtain the complementary video representations. They are trained end-to-end with greater efficiency compared to state-of-the-art models. Our contributions are shown as follows: First, we analyze the effect of diverse hyper-parameter settings qualitatively to illustrate the general tendency of performance. Additionally, we experiment with low-, mid-, high-level representations of the video in various time pooling manners, experimentally demonstrating how well different level representations contribute to action recognition. Besides, we also make comparisons of computation complexity between competitive methods to verify the efficiency. Finally, A series of experiments are carried out on two standard video action benchmarks of HMDB-51 and UCF-101 dataset. Experimental results show MRRN exceeds the majority of models which only take RGB data as input and obtains comparable performances with the state-of-the-art without additional data, achieving 51.3 and 81.9

READ FULL TEXT

page 3

page 6

page 7

page 8

research
11/24/2018

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks

Different from RGB videos, depth data in RGB-D videos provide key comple...
research
11/24/2016

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

We propose a novel method for temporally pooling frames in a video for t...
research
11/30/2017

A Closer Look at Spatiotemporal Convolutions for Action Recognition

In this paper we discuss several forms of spatiotemporal convolutions fo...
research
06/27/2017

Recurrent Residual Learning for Action Recognition

Action recognition is a fundamental problem in computer vision with a lo...
research
10/24/2019

Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction

This extended abstract describes our solution for the Traffic4Cast Chall...
research
11/22/2016

Learning Multi-level Features For Sensor-based Human Action Recognition

This paper proposes a multi-level feature learning framework for human a...
research
11/17/2018

Recurrence to the Rescue: Towards Causal Spatiotemporal Representations

Recently, three dimensional (3D) convolutional neural networks (CNNs) ha...

Please sign up or login with your details

Forgot password? Click here to reset