Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation

06/14/2023
by   Yongqi Yang, et al.
0

Text-to-Image (T2I) generation with diffusion models allows users to control the semantic content in the synthesized images given text conditions. As a further step toward a more customized image creation application, we introduce a new multi-modality generation setting that synthesizes images based on not only the semantic-level textual input but also on the pixel-level visual conditions. Existing literature first converts the given visual information to semantic-level representation by connecting it to languages, and then incorporates it into the original denoising process. Seemingly intuitive, such methodological design loses the pixel values during the semantic transition, thus failing to fulfill the task scenario where the preservation of low-level vision is desired (e.g., ID of a given face image). To this end, we propose Cyclic One-Way Diffusion (COW), a training-free framework for creating customized images with respect to semantic text and pixel-visual conditioning. Notably, we observe that sub-regions of an image impose mutual interference, just like physical diffusion, to achieve ultimate harmony along the denoising trajectory. Thus we propose to repetitively utilize the given visual condition in a cyclic way, by planting the visual condition as a high-concentration “seed” at the initialization step of the denoising process, and “diffuse” it into a harmonious picture by controlling a one-way information flow from the visual condition. We repeat the destroy-and-construct process multiple times to gradually but steadily impose the internal diffusion process within the image. Experiments on the challenging one-shot face and text-conditioned image synthesis task demonstrate our superiority in terms of speed, image quality, and conditional fidelity compared to learning-based text-vision conditional methods.

READ FULL TEXT

page 2

page 9

page 10

page 11

page 12

page 15

page 18

page 19

research
02/23/2023

Controlled and Conditional Text to Image Generation with Diffusion Prior

Denoising Diffusion models have shown remarkable performance in generati...
research
05/08/2022

On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

Conditional image generation has paved the way for several breakthroughs...
research
02/05/2023

ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories

Diffusion models have recently exhibited remarkable abilities to synthes...
research
05/19/2023

Late-Constraint Diffusion Guidance for Controllable Image Synthesis

Diffusion models, either with or without text condition, have demonstrat...
research
05/30/2023

RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion

Macrocyclic peptides are an emerging therapeutic modality, yet computati...
research
11/02/2022

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Large-scale diffusion-based generative models have led to breakthroughs ...
research
09/08/2023

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Diffusion models have revolted the field of text-to-image generation rec...

Please sign up or login with your details

Forgot password? Click here to reset