SpaText: Spatio-Textual Representation for Controllable Image Generation

11/25/2022
by   Omri Avrahami, et al.
0

Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.

READ FULL TEXT

page 10

page 11

page 12

page 13

page 15

page 16

page 20

page 25

research
08/09/2023

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

In the text-to-image generation field, recent remarkable progress in Sta...
research
07/25/2023

Composite Diffusion | whole >= Σparts

For an artist or a graphic designer, the spatial layout of a scene is a ...
research
03/24/2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting co...
research
08/24/2023

Dense Text-to-Image Generation with Attention Modulation

Existing text-to-image diffusion models struggle to synthesize realistic...
research
03/01/2023

Collage Diffusion

Text-conditional diffusion models generate high-quality, diverse images....
research
05/05/2023

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Diffusion models have the ability to generate high quality images by den...
research
02/09/2023

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation

Text-to-image generation models represent the next step of evolution in ...

Please sign up or login with your details

Forgot password? Click here to reset