Adding 3D Geometry Control to Diffusion Models

06/13/2023
by   Wufei Ma, et al.
0

Diffusion models have emerged as a powerful method of generative modeling across a range of fields, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure of the objects in the generated images. In this paper, we propose a novel method that incorporates 3D geometry control into diffusion models, making them generate even more realistic and diverse images. To achieve this, our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We generate images of 3D objects taken from a 3D shape repository (e.g., ShapeNet and Objaverse), render them from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. This allows us to use the generated images to improve a lot of vision tasks, e.g., classification and 3D pose estimation, in both in-distribution (ID) and out-of-distribution (OOD) settings. We demonstrate the effectiveness of our method through extensive experiments on ImageNet-50, ImageNet-R, PASCAL3D+, ObjectNet3D, and OOD-CV datasets. The results show that our method significantly outperforms existing methods across multiple benchmarks (e.g., 4.6 percentage points on ImageNet-50 using ViT and 3.5 percentage points on PASCAL3D+ and ObjectNet3D using NeMo).

READ FULL TEXT

page 2

page 5

page 7

page 14

page 15

research
04/10/2023

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Removing out-of-distribution (OOD) images from noisy images scraped from...
research
08/15/2023

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Exemplar-based sketch-to-photo synthesis allows users to generate photo-...
research
05/11/2023

Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models

We introduce a new technique for generating retinal fundus images that h...
research
12/02/2021

Zero-Shot Text-Guided Object Generation with Dream Fields

We combine neural rendering with multi-modal image and text representati...
research
06/29/2023

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

Given sparse views of an object, estimating their camera poses is a long...
research
02/25/2023

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusi...
research
10/31/2022

Intelligent Painter: Picture Composition With Resampling Diffusion Model

Have you ever thought that you can be an intelligent painter? This means...

Please sign up or login with your details

Forgot password? Click here to reset