1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

12/27/2022
by   Zhiwei Hu, et al.
0

The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.

READ FULL TEXT

page 1

page 3

research
07/26/2023

Tracking Anything in High Quality

Visual object tracking is a fundamental video task in computer vision. R...
research
06/20/2022

5th Place Solution for YouTube-VOS Challenge 2022: Video Object Segmentation

Video object segmentation (VOS) has made significant progress with the r...
research
11/29/2021

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The referring video object segmentation task (RVOS) involves segmentatio...
research
01/03/2022

Language as Queries for Referring Video Object Segmentation

Referring video object segmentation (R-VOS) is an emerging cross-modal t...
research
11/18/2022

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

This technical report describes our 2nd-place solution for the ECCV 2022...
research
04/10/2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

This paper presents Video K-Net, a simple, strong, and unified framework...
research
05/13/2023

A Two-Stage Real Image Deraining Method for GT-RAIN Challenge CVPR 2023 Workshop UG^2+ Track 3

In this technical report, we briefly introduce the solution of our team ...

Please sign up or login with your details

Forgot password? Click here to reset