TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

08/01/2018
by   Gurkirt Singh, et al.
0

Current state-of-the-art methods solve spatiotemporal action localisation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called action micro-tubes. However, they fail to consider that the underlying anchor proposal hypotheses should also move (transition) from frame to frame, as the actor or the camera does. Assuming we evaluate n 2D anchors in each frame, then the number of possible transitions from each 2D anchor to the next, for a sequence of f consecutive frames, is in the order of O(n^f), expensive even for small values of f. To avoid this problem, we introduce a Transition-Matrix-based Network (TraMNet) which relies on computing transition probabilities between anchor proposals while maximising their overlap with ground truth bounding boxes across frames, and enforcing sparsity via a transition threshold. As the resulting transition matrix is sparse and stochastic, this reduces the proposal hypothesis search space from O(n^f) to the cardinality of the thresholded matrix. At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network. At test time, a denser transition matrix can be obtained either by decreasing the threshold or by adding to it all the relative transitions originating from any cell location, allowing the network to handle transitions in the test data that might not have been present in the training data, and making detection translation-invariant. Finally, we show that our network can handle sparse annotations such as those available in the DALY dataset. We report extensive experiments on the DALY, UCF101-24 and Transformed-UCF101-24 datasets to support our claims.

READ FULL TEXT

page 2

page 9

research
08/23/2016

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

In this paper, we address the problem of searching action proposals in u...
research
07/07/2016

Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

This paper considers the problem of localizing actions in videos as a se...
research
07/18/2020

Bounding Maps for Universal Lesion Detection

Universal Lesion Detection (ULD) in computed tomography plays an essenti...
research
06/26/2017

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

In this paper, we present YoTube-a novel network fusion framework for se...
research
09/13/2014

Self-taught Object Localization with Deep Networks

This paper introduces self-taught object localization, a novel approach ...
research
07/10/2016

Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection

A human action can be seen as transitions between one's body poses over ...
research
02/09/2021

Robust Motion In-betweening

In this work we present a novel, robust transition generation technique ...

Please sign up or login with your details

Forgot password? Click here to reset