STEP: Spatio-Temporal Progressive Learning for Video Action Detection

04/19/2019
by   Xitong Yang, et al.
8

In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach progressively refines the proposals towards actions over a few steps. In this way, high-quality proposals (i.e., adhere to action movements) can be gradually obtained at later steps by leveraging the regression outputs from previous steps. At each step, we adaptively extend the proposals in time to incorporate more related temporal context. Compared to the prior work that performs action detection in one run, our progressive learning framework is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. We extensively evaluate our approach on UCF101 and AVA, and demonstrate superior detection results. Remarkably, we achieve mAP of 75.0 datasets with 3 progressive steps and using respectively only 11 and 34 initial proposals.

READ FULL TEXT

page 3

page 4

page 8

page 11

research
11/20/2018

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

Existing approaches for spatio-temporal action detection in videos are l...
research
05/26/2016

Automatic Action Annotation in Weakly Labeled Videos

Manual spatio-temporal annotation of human action in videos is laborious...
research
05/28/2019

Improving Action Localization by Progressive Cross-stream Cooperation

Spatio-temporal action localization consists of three levels of tasks: s...
research
05/31/2019

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

Current state-of-the-art approaches for spatio-temporal action detection...
research
04/28/2022

Temporal Progressive Attention for Early Action Prediction

Early action prediction deals with inferring the ongoing action from par...
research
11/21/2014

Finding Action Tubes

We address the problem of action detection in videos. Driven by the late...
research
08/19/2020

CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization

Most current pipelines for spatio-temporal action localization connect f...

Please sign up or login with your details

Forgot password? Click here to reset