Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based Recurrent Neural Networks

by   Adrian Sanchez-Caballero, et al.

As in many other different fields, deep learning has become the main approach in most computer vision applications, such as scene understanding, object recognition, computer-human interaction or human action recognition (HAR). Research efforts within HAR have mainly focused on how to efficiently extract and process both spatial and temporal dependencies of video sequences. In this paper, we propose and compare, two neural networks based on the convolutional long short-term memory unit, namely ConvLSTM, with differences in the architecture and the long-term learning strategy. The former uses a video-length adaptive input data generator (stateless) whereas the latter explores the stateful ability of general recurrent neural networks but applied in the particular case of HAR. This stateful property allows the model to accumulate discriminative patterns from previous frames without compromising computer memory. Experimental results on the large-scale NTU RGB+D dataset show that the proposed models achieve competitive recognition accuracies with lower computational cost compared with state-of-the-art methods and prove that, in the particular case of videos, the rarely-used stateful mode of recurrent neural networks significantly improves the accuracy obtained with the standard mode. The recognition accuracies obtained are 75.26% (CS) and 75.45% (CV) for the stateless model, with an average time consumption per video of 0.21 s, and 80.43% (CS) and 79.91%(CV) with 0.89 s for the stateful version.


page 6

page 11

page 16

page 27


3DFCNN: Real-Time Action Recognition using 3D Deep Neural Networks with Raw Depth Information

Human actions recognition is a fundamental task in artificial vision, th...

Two-Stream RNN/CNN for Action Recognition in 3D Videos

The recognition of actions from video sequences has many applications in...

A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition

In the last few years, compression of deep neural networks has become an...

Action Recognition using Visual Attention

We propose a soft attention based model for the task of action recogniti...

Event and Activity Recognition in Video Surveillance for Cyber-Physical Systems

This chapter aims to aid the development of Cyber-Physical Systems (CPS)...

Adaptive Detrending to Accelerate Convolutional Gated Recurrent Unit Training for Contextual Video Recognition

Based on the progress of image recognition, video recognition has been e...

When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data

Human action recognition from skeletal data is a hot research topic and ...

Please sign up or login with your details

Forgot password? Click here to reset