Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

09/21/2023
by   Ping Li, et al.
0

Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge. However, previous methods do not fully use spatial-temporal context and fail to tackle this challenging task in real-time. This motivates us to develop an efficient Long-Short Temporal Attention network (termed LSTA) for unsupervised VOS task from a holistic view. Specifically, LSTA consists of two dominant modules, i.e., Long Temporal Memory and Short Temporal Attention. The former captures the long-term global pixel relations of the past frames and the current frame, which models constantly present objects by encoding appearance pattern. Meanwhile, the latter reveals the short-term local pixel relations of one nearby frame and the current frame, which models moving objects by encoding motion pattern. To speedup the inference, the efficient projection and the locality-based sliding window are adopted to achieve nearly linear time complexity for the two light modules, respectively. Extensive empirical studies on several benchmarks have demonstrated promising performances of the proposed method with high efficiency.

READ FULL TEXT

page 5

page 16

page 19

page 20

research
03/13/2020

Dual Temporal Memory Network for Efficient Video Object Segmentation

Video Object Segmentation (VOS) is typically formulated in a semi-superv...
research
09/02/2020

LSMVOS: Long-Short-Term Similarity Matching for Video Object

Objective Semi-supervised video object segmentation refers to segmenting...
research
01/19/2020

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

We introduce a novel network, called CO-attention Siamese Network (COSNe...
research
07/31/2019

On the difficulty of learning and predicting the long-term dynamics of bouncing objects

The ability to accurately predict the surrounding environment is a found...
research
07/20/2022

ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network

In this paper a pure-attention bottom-up approach, called ViGAT, that ut...
research
07/16/2023

Holistic Prototype Attention Network for Few-Shot VOS

Few-shot video object segmentation (FSVOS) aims to segment dynamic objec...
research
06/14/2023

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation

Referring video object segmentation (RVOS) aims to segment the target in...

Please sign up or login with your details

Forgot password? Click here to reset