Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

12/09/2022
by   Weixi Feng, et al.
0

Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are still considered major challenging issues, especially when involving multiple objects. In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions. To do this, we incorporate linguistic structures with the diffusion guidance process based on the controllable properties of manipulating cross-attention layers in diffusion-based T2I models. We observe that keys and values in cross-attention layers have strong semantic meanings associated with object layouts and content. Therefore, we can better preserve the compositional semantics in the generated image by manipulating the cross-attention representations based on linguistic insights. Built upon Stable Diffusion, a SOTA T2I model, our structured cross-attention design is efficient that requires no additional training samples. We achieve better compositional skills in qualitative and quantitative results, leading to a 5-8 in-depth analysis to reveal potential causes of incorrect image compositions and justify the properties of cross-attention layers in the generation process.

READ FULL TEXT

page 6

page 7

page 13

page 14

page 15

page 17

page 18

page 19

research
06/08/2023

Grounded Text-to-Image Synthesis with Attention Refocusing

Driven by scalable diffusion models trained on large-scale paired text-i...
research
05/07/2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

With the help of conditioning mechanisms, the state-of-the-art diffusion...
research
05/26/2023

CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography

Current image steganography techniques are mainly focused on cover-based...
research
07/20/2023

Divide Bind Your Attention for Improved Generative Semantic Nursing

Emerging large-scale text-to-image generative models, e.g., Stable Diffu...
research
05/19/2023

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Diffusion models have made impressive progress in text-to-image synthesi...
research
08/13/2023

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

Thanks to the rapid development of diffusion models, unprecedented progr...
research
02/22/2023

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Since their introduction, diffusion models have quickly become the preva...

Please sign up or login with your details

Forgot password? Click here to reset