OBJECT 3DIT: Language-guided 3D-aware Image Editing

07/20/2023
by   Oscar Michel, et al.
0

Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT : single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Surprisingly, training on only synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to real-world images.

READ FULL TEXT

page 1

page 7

page 9

page 10

page 16

page 17

research
03/26/2023

BlobGAN-3D: A Spatially-Disentangled 3D-Aware Generative Model for Indoor Scenes

3D-aware image synthesis has attracted increasing interest as it models ...
research
06/16/2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Text-guided image editing is widely needed in daily life, ranging from p...
research
03/23/2023

SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

Despite the great success in 2D editing using user-friendly tools, such ...
research
12/25/2019

Inverse Rendering Techniques for Physically Grounded Image Editing

From a single picture of a scene, people can typically grasp the spatial...
research
03/16/2023

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Incorporating human feedback has been shown to be crucial to align text ...
research
08/17/2019

Neural Re-Simulation for Generating Bounces in Single Images

We introduce a method to generate videos of dynamic virtual objects plau...
research
06/09/2023

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Latent diffusion models (LDMs) exhibit an impressive ability to produce ...

Please sign up or login with your details

Forgot password? Click here to reset