Grounded Text-to-Image Synthesis with Attention Refocusing

06/08/2023
by   Quynh Phung, et al.
0

Driven by scalable diffusion models trained on large-scale paired text-image datasets, text-to-image synthesis methods have shown compelling results. However, these models still fail to precisely follow the text prompt when multiple objects, attributes, and spatial compositions are involved in the prompt. In this paper, we identify the potential reasons in both the cross-attention and self-attention layers of the diffusion model. We propose two novel losses to refocus the attention maps according to a given layout during the sampling process. We perform comprehensive experiments on the DrawBench and HRS benchmarks using layouts synthesized by Large Language Models, showing that our proposed losses can be integrated easily and effectively into existing text-to-image methods and consistently improve their alignment between the generated images and the text prompts.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 8

page 9

page 10

research
03/25/2023

Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a clo...
research
12/09/2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on t...
research
09/29/2022

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diff...
research
01/23/2023

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Text-to-image synthesis has recently seen significant progress thanks to...
research
09/08/2023

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

Recent advancements in diffusion models have showcased their impressive ...
research
06/26/2023

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

While recent developments in text-to-image generative models have led to...
research
06/14/2023

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Diffusion models (DMs) have recently gained attention with state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset