Real-World Image Variation by Aligning Diffusion Inversion Chain

by   Yuechen Zhang, et al.

Recent diffusion model advancements have enabled high-fidelity images to be generated using text prompts. However, a domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain. Specifically, we demonstrate that step-wise latent distribution alignment is essential for generating high-quality variations. To attain this, we design a cross-image self-attention injection for feature interaction and a step-wise distribution normalization to align the latent features. Incorporating these alignment processes into a diffusion model allows RIVAL to generate high-quality image variations without further parameter optimization. Our experimental results demonstrate that our proposed approach outperforms existing methods with respect to semantic-condition similarity and perceptual quality. Furthermore, this generalized inference pipeline can be easily applied to other diffusion-based generation tasks, such as image-conditioned text-to-image generation and example-based image inpainting.


page 6

page 7

page 8

page 12

page 13

page 14

page 15

page 16


The Stable Artist: Steering Semantics in Diffusion Latent Space

Large, text-conditioned generative diffusion models have recently gained...

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

Text-to-image models give rise to workflows which often begin with an ex...

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

The remarkable capabilities of pretrained image diffusion models have be...

Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models

In recent years, diffusion models have become the most popular and power...

3DGen: Triplane Latent Diffusion for Textured Mesh Generation

Latent diffusion models for image generation have crossed a quality thre...

PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

Generating an image from a provided descriptive text is quite a challeng...

Accuracy and Fidelity Comparison of Luna and DALL-E 2 Diffusion-Based Image Generation Systems

We qualitatively examine the accuracy and fideltiy between two diffusion...

Please sign up or login with your details

Forgot password? Click here to reset