H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

by   Changlin Li, et al.

Capitalizing on the rapid development of neural networks, recent video frame interpolation (VFI) methods have achieved notable improvements. However, they still fall short for real-world videos containing large motions. Complex deformation and/or occlusion caused by large motions make it an extremely difficult problem in video frame interpolation. In this paper, we propose a simple yet effective solution, H-VFI, to deal with large motions in video frame interpolation. H-VFI contributes a hierarchical video interpolation transformer (HVIT) to learn a deformable kernel in a coarse-to-fine strategy in multiple scales. The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame. Starting from the smallest scale, H-VFI updates the deformable kernel by a residual in succession based on former predicted kernels, intermediate interpolated results and hierarchical features from transformer. Bias and masks to refine the final outputs are then predicted by a transformer block based on interpolated results. The advantage of such a progressive approximation is that the large motion frame interpolation problem can be decomposed into several relatively simpler sub-tasks, which enables a very accurate prediction in the final results. Another noteworthy contribution of our paper consists of a large-scale high-quality dataset, YouTube200K, which contains videos depicting a great variety of scenarios captured at high resolution and high frame rate. Extensive experiments on multiple frame interpolation benchmarks validate that H-VFI outperforms existing state-of-the-art methods especially for videos with large motions.


Deep Animation Video Interpolation in the Wild

In the animation industry, cartoon videos are usually produced at low fr...

Video Interpolation via Generalized Deformable Convolution

Video interpolation aims at increasing the frame rate of a given video b...

Video Frame Interpolation with Flow Transformer

Video frame interpolation has been actively studied with the development...

Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction

Inter prediction is an important module in video coding for temporal red...

Beyond Natural Motion: Exploring Discontinuity for Video Frame Interpolation

Video interpolation is the task that synthesizes the intermediate frame ...

Cross-Attention Transformer for Video Interpolation

We propose TAIN (Transformers and Attention for video INterpolation), a ...

LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Existing works on video frame interpolation (VFI) mostly employ deep neu...

Please sign up or login with your details

Forgot password? Click here to reset