Watch Your Steps: Local Image and Scene Editing by Text Instructions

08/17/2023
by   Ashkan Mirzaei, et al.
0

Denoising diffusion models have enabled high-quality image generation and editing. We present a method to localize the desired edit region implicit in a text instruction. We leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. This discrepancy is referred to as the relevance map. The relevance map conveys the importance of changing each pixel to achieve the edits, and is used to to guide the modifications. This guidance ensures that the irrelevant pixels remain unchanged. Relevance maps are further used to enhance the quality of text-guided editing of 3D scenes in the form of neural radiance fields. A field is trained on relevance maps of training views, denoted as the relevance field, defining the 3D region within which modifications should be made. We perform iterative updates on the training views guided by rendered relevance maps from the relevance field. Our method achieves state-of-the-art performance on both image and NeRF editing tasks. Project page: https://ashmrz.github.io/WatchYourSteps/

READ FULL TEXT

page 1

page 2

page 6

page 7

page 8

page 12

page 13

page 15

research
07/26/2023

Visual Instruction Inversion: Image Editing via Visual Prompting

Text-conditioned image editing has emerged as a powerful tool for editin...
research
11/25/2022

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Text-guided diffusion models have shown superior performance in image/vi...
research
03/22/2023

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

We propose a method for editing NeRF scenes with text-instructions. Give...
research
05/27/2023

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference

Due to the recent success of diffusion models, text-to-image generation ...
research
08/28/2023

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Music editing primarily entails the modification of instrument tracks or...
research
08/11/2020

Text as Neural Operator: Image Manipulation by Text Instruction

In this paper, we study a new task that allows users to edit an input im...
research
03/24/2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting co...

Please sign up or login with your details

Forgot password? Click here to reset