Automatic Action Annotation in Weakly Labeled Videos

by   Waqas Sultani, et al.

Manual spatio-temporal annotation of human action in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem using shape, global and fine grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Our method can also annotate multiple instances of the same action in a video. We have validated our approach on three challenging action datasets: UCF Sport, sub-JHMDB and THUMOS'13 and have obtained promising results compared to several baseline methods. Moreover, on UCF Sports, we demonstrate that action classifiers trained on these automatically obtained spatio-temporal annotations have comparable performance to the classifiers trained on ground truth annotation.


page 4

page 7

page 8

page 10

page 11

page 17

page 18


Spot On: Action Localization from Pointly-Supervised Proposals

We strive for spatio-temporal localization of actions in videos. The sta...

Unsupervised Action Proposal Ranking through Proposal Recombination

Recently, action proposal methods have played an important role in actio...

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

In this paper, we propose Spatio-TEmporal Progressive (STEP) action dete...

t-EVA: Time-Efficient t-SNE Video Annotation

Video understanding has received more attention in the past few years du...

Finding Action Tubes

We address the problem of action detection in videos. Driven by the late...

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of a...

Estimating Blink Probability for Highlight Detection in Figure Skating Videos

Highlight detection in sports videos has a broad viewership and huge com...

Please sign up or login with your details

Forgot password? Click here to reset