SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis

04/28/2023
by   Azade Farshad, et al.
13

Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging. To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.

READ FULL TEXT

page 1

page 3

page 6

research
03/14/2023

Text-to-image Diffusion Model in Generative AI: A Survey

This survey reviews text-to-image diffusion models in the context that d...
research
08/08/2023

3D Scene Diffusion Guidance using Scene Graphs

Guided synthesis of high-quality 3D scenes is a challenging task. Diffus...
research
03/21/2023

Compositional 3D Scene Generation using Locally Conditioned Diffusion

Designing complex 3D scenes has been a tedious, manual process requiring...
research
10/18/2022

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

Recently, diffusion models have been proven to perform remarkably well i...
research
07/17/2023

Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation

Diffusion models are a powerful class of generative models that can prod...
research
08/02/2023

Reverse Stable Diffusion: What prompt was used to generate this image?

Text-to-image diffusion models such as Stable Diffusion have recently at...
research
01/17/2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Large-scale text-to-image diffusion models have made amazing advances. H...

Please sign up or login with your details

Forgot password? Click here to reset