Video Region Annotation with Sparse Bounding Boxes

08/17/2020
by   Yuzheng Xu, et al.
4

Video analysis has been moving towards more detailed interpretation (e.g. segmentation) with encouraging progresses. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labour-intensive, few densely annotated video data with detailed region boundaries exist. This work aims to resolve this dilemma by learning to automatically generate region boundaries for all frames of a video from sparsely annotated bounding boxes of target regions. We achieve this with a Volumetric Graph Convolutional Network (VGCN), which learns to iteratively find keypoints on the region boundaries using the spatio-temporal volume of surrounding appearance and motion. The global optimization of VGCN makes it significantly stronger and generalize better than existing solutions. Experimental results using two latest datasets (one real and one synthetic), including ablation studies, demonstrate the effectiveness and superiority of our method.

READ FULL TEXT

page 1

page 4

page 6

page 9

research
12/23/2020

Efficient video annotation with visual interpolation and frame selection guidance

We introduce a unified framework for generic video annotation with bound...
research
02/19/2023

Accelerated Video Annotation driven by Deep Detector and Tracker

Annotating object ground truth in videos is vital for several downstream...
research
04/05/2023

Knowledge Combination to Learn Rotated Detection Without Rotated Annotation

Rotated bounding boxes drastically reduce output ambiguity of elongated ...
research
05/16/2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Spatio-temporal action detection is an important and challenging problem...
research
07/26/2017

Deep Interactive Region Segmentation and Captioning

With recent innovations in dense image captioning, it is now possible to...
research
05/28/2019

Improving Action Localization by Progressive Cross-stream Cooperation

Spatio-temporal action localization consists of three levels of tasks: s...
research
09/10/2018

The AAU Multimodal Annotation Toolboxes: Annotating Objects in Images and Videos

This tech report gives an introduction to two annotation toolboxes that ...

Please sign up or login with your details

Forgot password? Click here to reset