Mask-Free Video Instance Segmentation

03/28/2023
by   Lei Ke, et al.
0

The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at https://github.com/SysCV/MaskFreeVis.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 7

page 10

page 11

page 12

research
12/15/2022

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration

Instance segmentation in videos, which aims to segment and track multipl...
research
03/23/2021

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Weakly supervised instance segmentation reduces the cost of annotations ...
research
07/28/2022

Video Mask Transfiner for High-Quality Video Instance Segmentation

While Video Instance Segmentation (VIS) has seen rapid progress, current...
research
11/15/2021

Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Video instance segmentation aims to detect, segment, and track objects i...
research
12/03/2022

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

In contrast to fully supervised methods using pixel-wise mask labels, bo...
research
03/26/2019

Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data

This paper presents a new method for shadow removal using unpaired data,...
research
11/29/2021

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The referring video object segmentation task (RVOS) involves segmentatio...

Please sign up or login with your details

Forgot password? Click here to reset