BSUV-Net 2.0: Spatio-Temporal Data Augmentations for Video-AgnosticSupervised Background Subtraction

by   M. Ozan Tezcan, et al.

Background subtraction (BGS) is a fundamental video processing task which is a key component of many applications. Deep learning-based supervised algorithms achieve very promising results in BGS, however, most of these algorithms are optimized for either a specific video or a group of videos, and their performance decreases significantly when applied to unseen videos. Recently, several papers addressed this problem and proposed video-agnostic supervised BGS algorithms. However, nearly all of the data augmentations used in these works are limited to spatial domain and do not account for temporal variations naturally occurring in video data. In this work, we introduce spatio-temporal data augmentations and apply it to one of the leading video-agnostic BGS algorithms, BSUV-Net. Our new model trained using the proposed data augmentations, named BSUV-Net 2.0, significantly outperforms the state-of-the-art algorithms evaluated on unseen videos. We also develop a real-time variant of our model named Fast BSUV-Net 2.0 with performance close to the state-of-the-art. Furthermore, we introduce a new cross-validation training and evaluation strategy for the CDNet-2014 dataset that makes it possible to fairly and easily compare the performance of various video-agnostic supervised BGS algorithms. The source code of BSUV-Net 2.0 will be published.


page 5

page 7


A Fully-Convolutional Neural Network for Background Subtraction of Unseen Videos

Background subtraction is a basic task in computer vision and video proc...

LAEO-Net: revisiting people Looking At Each Other in videos

Capturing the `mutual gaze' of people is essential for understanding and...

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework

Video deraining is an important task in computer vision as the unwanted ...

Graph CNN for Moving Object Detection in Complex Environments from Unseen Videos

Moving Object Detection (MOD) is a fundamental step for many computer vi...

Space-Time Crop Attend: Improving Cross-modal Video Representation Learning

The quality of the image representations obtained from self-supervised l...

Learning Long-Term Style-Preserving Blind Video Temporal Consistency

When trying to independently apply image-trained algorithms to successiv...

LapTool-Net: A Contextual Detector of Surgical Tools in Laparoscopic Videos Based on Recurrent Convolutional Neural Networks

We propose a new multilabel classifier, called LapTool-Net to detect the...

Please sign up or login with your details

Forgot password? Click here to reset