Guess Where? Actor-Supervision for Spatiotemporal Action Localization

04/05/2018
by   Victor Escorcia, et al.
0

This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which is linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable. Experiments on three human and non-human action datasets show actor supervision is state-of-the-art for weakly-supervised action localization and is even competitive to some fully-supervised alternatives.

READ FULL TEXT

page 3

page 4

page 7

page 8

research
07/08/2018

Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

The goal of this paper is spatio-temporal localization of human actions ...
research
07/28/2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify ...
research
12/06/2018

Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing...
research
06/08/2021

Few-Shot Action Localization without Knowing Boundaries

Learning to localize actions in long, cluttered, and untrimmed videos is...
research
12/02/2018

Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries

In this paper, we propose an end-to-end capsule network for pixel level ...
research
09/06/2022

Spatio-Temporal Action Detection Under Large Motion

Current methods for spatiotemporal action tube detection often extend a ...
research
04/29/2021

Learning Actor-centered Representations for Action Localization in Streaming Videos using Predictive Learning

Event perception tasks such as recognizing and localizing actions in str...

Please sign up or login with your details

Forgot password? Click here to reset