Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream

08/25/2022
by   Yue Tang, et al.
0

Detecting actions in videos have been widely applied in on-device applications. Practical on-device videos are always untrimmed with both action and background. It is desirable for a model to both recognize the class of action and localize the temporal position where the action happens. Such a task is called temporal action location (TAL), which is always trained on the cloud where multiple untrimmed videos are collected and labeled. It is desirable for a TAL model to continuously and locally learn from new data, which can directly improve the action detection precision while protecting customers' privacy. However, it is non-trivial to train a TAL model, since tremendous video samples with temporal annotations are required. However, annotating videos frame by frame is exorbitantly time-consuming and expensive. Although weakly-supervised TAL (W-TAL) has been proposed to learn from untrimmed videos with only video-level labels, such an approach is also not suitable for on-device learning scenarios. In practical on-device learning applications, data are collected in streaming. Dividing such a long video stream into multiple video segments requires lots of human effort, which hinders the exploration of applying the TAL tasks to realistic on-device learning applications. To enable W-TAL models to learn from a long, untrimmed streaming video, we propose an efficient video learning approach that can directly adapt to new environments. We first propose a self-adaptive video dividing approach with a contrast score-based segment merging approach to convert the video stream into multiple segments. Then, we explore different sampling strategies on the TAL tasks to request as few labels as possible. To the best of our knowledge, we are the first attempt to directly learn from the on-device, long video stream.

READ FULL TEXT
research
03/29/2022

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Weakly-supervised temporal action localization aims to recognize and loc...
research
05/06/2019

Spatio-Temporal Action Localization in a Weakly Supervised Setting

Enabling computational systems with the ability to localize actions in v...
research
10/07/2016

Weakly supervised learning of actions from transcripts

We present an approach for weakly supervised learning of human actions f...
research
06/20/2021

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to rec...
research
05/06/2021

Weakly Supervised Action Selection Learning in Video

Localizing actions in video is a core task in computer vision. The weakl...
research
02/08/2023

Weakly-supervised Representation Learning for Video Alignment and Analysis

Many tasks in video analysis and understanding boil down to the need for...
research
03/24/2021

The Blessings of Unlabeled Background in Untrimmed Videos

Weakly-supervised Temporal Action Localization (WTAL) aims to detect the...

Please sign up or login with your details

Forgot password? Click here to reset