Recurrent Self-Supervised Video Denoising with Denser Receptive Field

by   Zichun Wang, et al.

Self-supervised video denoising has seen decent progress through the use of blind spot networks. However, under their blind spot constraints, previous self-supervised video denoising methods suffer from significant information loss and texture destruction in either the whole reference frame or neighbor frames, due to their inadequate consideration of the receptive field. Moreover, the limited number of available neighbor frames in previous methods leads to the discarding of distant temporal information. Nonetheless, simply adopting existing recurrent frameworks does not work, since they easily break the constraints on the receptive field imposed by self-supervision. In this paper, we propose RDRF for self-supervised video denoising, which not only fully exploits both the reference and neighbor frames with a denser receptive field, but also better leverages the temporal information from both local and distant neighbor features. First, towards a comprehensive utilization of information from both reference and neighbor frames, RDRF realizes a denser receptive field by taking more neighbor pixels along the spatial and temporal dimensions. Second, it features a self-supervised recurrent video denoising framework, which concurrently integrates distant and near-neighbor temporal features. This enables long-term bidirectional information aggregation, while mitigating error accumulation in the plain recurrent framework. Our method exhibits superior performance on both synthetic and real video denoising datasets. Codes will be available at


page 2

page 3

page 6

page 7

page 8


LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising

Despite the significant results on synthetic noise under simplified assu...

Self-Supervised training for blind multi-frame video denoising

We propose a self-supervised approach for training multi-frame video den...

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots

Real noisy-clean pairs on a large scale are costly and difficult to obta...

ViDeNN: Deep Blind Video Denoising

We propose ViDeNN: a CNN for Video Denoising without prior knowledge on ...

An Image is Worth 16x16 Words, What is a Video Worth?

Leading methods in the domain of action recognition try to distill infor...

Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Existing unsupervised video-to-video translation methods fail to produce...

Diagnosing and Preventing Instabilities in Recurrent Video Processing

Recurrent models are becoming a popular choice for video enhancement tas...

Please sign up or login with your details

Forgot password? Click here to reset