Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

by   Zhenheng Yang, et al.

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing large-scale video data. To tackle this problem, we propose a cascade proposal and location anticipation (CPLA) model for frame-level action detection. There are several salient points of our model: (1) a cascade region proposal network (casRPN) is adopted for action proposal generation and shows better localization accuracy compared with single region proposal network (RPN); (2) action spatio-temporal consistencies are exploited via a location anticipation network (LAN) and thus frame-level action detection is not conducted independently. Frame-level detections are then linked by solving an linking score maximization problem, and temporally trimmed into spatio-temporal action tubes. We demonstrate the effectiveness of our model on the challenging UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.


page 2

page 4

page 5

page 10


A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions

Spatio-temporal action detection is an important and challenging problem...

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

Existing approaches for spatio-temporal action detection in videos are l...

Discovering Spatio-Temporal Action Tubes

In this paper, we address the challenging problem of spatial and tempora...

Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs

In this paper, we tackle the problem of spatio-temporal tagging of self-...

Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Current state-of-the-art human action recognition is focused on the clas...

Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge

Current state-of-the-art human activity recognition is focused on the cl...

Object Detection in Videos by Short and Long Range Object Linking

We address the problem of detecting objects in videos with the interest ...

Please sign up or login with your details

Forgot password? Click here to reset