Video Probabilistic Diffusion Models in Projected Latent Space

02/15/2023
by   Sihyun Yu, et al.
0

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of the prior state-of-the-art.

READ FULL TEXT

page 3

page 5

page 6

page 8

page 14

research
06/19/2023

GD-VDM: Generated Depth for better Diffusion-based Video Generation

The field of generative models has recently witnessed significant progre...
research
04/18/2023

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models (LDMs) enable high-quality image synthesis while...
research
03/24/2023

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Conditional image-to-video (cI2V) generation aims to synthesize a new pl...
research
03/16/2023

LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Existing works on video frame interpolation (VFI) mostly employ deep neu...
research
05/23/2022

Flexible Diffusion Modeling of Long Videos

We present a framework for video modeling based on denoising diffusion p...
research
12/01/2022

VIDM: Video Implicit Diffusion Models

Diffusion models have emerged as a powerful generative method for synthe...
research
09/04/2023

ControlMat: A Controlled Generative Approach to Material Capture

Material reconstruction from a photograph is a key component of 3D conte...

Please sign up or login with your details

Forgot password? Click here to reset