Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

03/30/2023
by   Wen Wang, et al.
0

Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at <https://github.com/baaivision/vid2vid-zero>.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

research
06/14/2023

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

Recently, diffusion-based generative models have achieved remarkable suc...
research
03/23/2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Recent text-to-video generation approaches rely on computationally heavy...
research
03/08/2023

Video-P2P: Video Editing with Cross-attention Control

This paper presents Video-P2P, a novel framework for real-world video ed...
research
10/05/2022

DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

We introduce the first work to explore web-scale diffusion models for ro...
research
08/21/2023

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Motivated by the superior performance of image diffusion models, more an...
research
05/27/2023

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Existing works have advanced Text-to-Image (TTI) diffusion models for vi...
research
05/22/2023

ControlVideo: Training-free Controllable Text-to-Video Generation

Text-driven diffusion models have unlocked unprecedented abilities in im...

Please sign up or login with your details

Forgot password? Click here to reset