RGB Stream Is Enough for Temporal Action Detection

07/09/2021
by   Chenhao Wang, et al.
0

State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow. Although combining RGB frames and optical flow boosts performance significantly, optical flow is a hand-designed representation which not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow. In this paper, we argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation (ILDA) is the key solution to avoid performance degradation when optical flow is removed. To evaluate the effectiveness of ILDA, we design a simple yet efficient one-stage temporal action detector based on single RGB stream named DaoTAD. Our results show that when trained with ILDA, DaoTAD has comparable accuracy with all existing state-of-the-art two-stream detectors while surpassing the inference speed of previous methods by a large margin and the inference speed is astounding 6668 fps on GeForce GTX 1080 Ti. Code is available at <https://github.com/Media-Smart/vedatad>.

READ FULL TEXT
research
06/07/2022

TadML: A fast temporal action detection with Mechanics-MLP

Temporal Action Detection(TAD) is a crucial but challenging task in vide...
research
04/01/2019

Dance with Flow: Two-in-One Stream Action Detection

The goal of this paper is to detect the spatio-temporal extent of an act...
research
02/23/2018

Real-Time End-to-End Action Detection with Two-Stream Networks

Two-stream networks have been very successful for solving the problem of...
research
05/05/2022

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Temporal action detection (TAD) is extensively studied in the video unde...
research
01/30/2023

Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels

The insufficient number of annotated thermal infrared (TIR) image datase...
research
03/11/2019

Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

In this paper, a novel two-stream architecture for the task of temporal ...
research
11/08/2022

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

The increasing number of surveillance cameras and security concerns have...

Please sign up or login with your details

Forgot password? Click here to reset