VDTR: Video Deblurring with Transformer

04/17/2022
by   Mingdeng Cao, et al.
0

Video deblurring is still an unsolved problem due to the challenging spatio-temporal modeling process. While existing convolutional neural network-based methods show a limited capacity for effective spatial and temporal modeling for video deblurring. This paper presents VDTR, an effective Transformer-based model that makes the first attempt to adapt Transformer for video deblurring. VDTR exploits the superior long-range and relation modeling capabilities of Transformer for both spatial and temporal modeling. However, it is challenging to design an appropriate Transformer-based model for video deblurring due to the complicated non-uniform blurs, misalignment across multiple frames and the high computational costs for high-resolution spatial modeling. To address these problems, VDTR advocates performing attention within non-overlapping windows and exploiting the hierarchical structure for long-range dependencies modeling. For frame-level spatial modeling, we propose an encoder-decoder Transformer that utilizes multi-scale features for deblurring. For multi-frame temporal modeling, we adapt Transformer to fuse multiple spatial features efficiently. Compared with CNN-based methods, the proposed method achieves highly competitive results on both synthetic and real-world video deblurring benchmarks, including DVD, GOPRO, REDS and BSD. We hope such a Transformer-based architecture can serve as a powerful alternative baseline for video deblurring and other video restoration tasks. The source code will be available at <https://github.com/ljzycmd/VDTR>.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 9

research
03/27/2022

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Space-time video super-resolution (STVSR) is the task of interpolating v...
research
07/13/2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Recent video recognition models utilize Transformer models for long-rang...
research
11/15/2022

Dynamic Temporal Filtering in Video Models

Video temporal dynamics is conventionally modeled with 3D spatial-tempor...
research
10/13/2021

Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions

The state-of-the-art deep neural networks are vulnerable to common corru...
research
05/31/2021

VidFace: A Full-Transformer Solver for Video FaceHallucination with Unaligned Tiny Snapshots

In this paper, we investigate the task of hallucinating an authentic hig...
research
04/12/2023

Adaptive Human Matting for Dynamic Videos

The most recent efforts in video matting have focused on eliminating tri...
research
03/31/2022

Bringing Old Films Back to Life

We present a learning-based framework, recurrent transformer network (RT...

Please sign up or login with your details

Forgot password? Click here to reset