Spatio-Temporal Action Detection Under Large Motion

09/06/2022
by   Gurkirt Singh, et al.
0

Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames. However, such pooling fails to accumulate meaningful spatiotemporal features if the position or shape of the actor shows large 2D motion and variability through the frames, due to large camera motion, large actor shape deformation, fast actor action and so on. In this work, we aim to study the performance of cuboid-aware feature aggregation in action detection under large action. Further, we propose to enhance actor feature representation under large motion by tracking actors and performing temporal feature aggregation along the respective tracks. We define the actor motion with intersection-over-union (IoU) between the boxes of action tubes/tracks at various fixed time scales. The action having a large motion would result in lower IoU over time, and slower actions would maintain higher IoU. We find that track-aware feature aggregation consistently achieves a large improvement in action detection performance, especially for actions under large motion compared to the cuboid-aware baseline. As a result, we also report state-of-the-art on the large-scale MultiSports dataset.

READ FULL TEXT

page 2

page 4

page 8

research
07/20/2020

Context-Aware RCNN: A Baseline for Action Detection in Videos

Video action detection approaches usually conduct actor-centric action r...
research
04/30/2022

RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems

Interactive autonomous applications require robustness of the perception...
research
04/05/2018

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

This paper addresses the problem of spatiotemporal localization of actio...
research
07/28/2018

Actor-Centric Relation Network

Current state-of-the-art approaches for spatio-temporal action localizat...
research
04/07/2021

The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods

For an autonomous robotic system, monitoring surgeon actions and assisti...
research
04/14/2021

Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts

Toward the goal of automatic production for sports broadcasts, a paramou...
research
08/04/2016

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

In this work, we propose an approach to the spatiotemporal localisation ...

Please sign up or login with your details

Forgot password? Click here to reset