First image then video: A two-stage network for spatiotemporal video denoising
Video denoising is to remove noise from noise-corrupted data, thus recovering true signals via spatiotemporal processing. Existing approaches for spatiotemporal video denoising tend to suffer from motion blur artifacts, that is, the boundary of a moving object tends to appear blurry especially when the object undergoes a fast motion, causing optical flow calculation to break down. In this paper, we address this challenge by designing a first-image-then-video two-stage denoising neural network, consisting of an image denoising module for spatially reducing intra-frame noise followed by a regular spatiotemporal video denoising module. The intuition is simple yet powerful and effective: the first stage of image denoising effectively reduces the noise level and, therefore, allows the second stage of spatiotemporal denoising for better modeling and learning everywhere, including along the moving object boundaries. This two-stage network, when trained in an end-to-end fashion, yields the state-of-the-art performances on the video denoising benchmark Vimeo90K dataset in terms of both denoising quality and computation. It also enables an unsupervised approach that achieves comparable performance to existing supervised approaches.
READ FULL TEXT