DeepAI AI Chat
Log In Sign Up

Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

by   Suman Saha, et al.
University of Oxford
Oxford Brookes University

Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. We cast the action tube extraction as an energy maximisation problem in which configurations of region proposals in each frame are assigned a cost and the best action tubes are selected via two passes of dynamic programming. One pass associates region proposals in space and time for each action category, and another pass is used to solve for the tube's temporal extent and to enforce a smooth label sequence through the video. In addition, by taking advantage of recent work on action foreground-background segmentation, we are able to associate each tube with class-specific segmentations. We demonstrate the performance of our algorithm on the challenging LIRIS-HARL dataset and achieve a new state-of-the-art result which is 14.3 times better than previous methods.


page 1

page 7


Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge

Current state-of-the-art human activity recognition is focused on the cl...

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

In this work, we propose an approach to the spatiotemporal localisation ...

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

In this work, we address the problem of spatio-temporal action detection...

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

This paper introduces a video dataset of spatio-temporally localized Ato...

Continuous Action Recognition Based on Sequence Alignment

Continuous action recognition is more challenging than isolated recognit...

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Recent incremental learning for action recognition usually stores repres...

Incremental Tube Construction for Human Action Detection

Current state-of-the-art action detection systems are tailored for offli...