DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation

05/05/2023
by   Hong Chen, et al.
0

Given a small set of images of a specific subject, subject-driven text-to-image generation aims to generate customized images of the subject according to new text descriptions, which has attracted increasing attention in the community recently. Current subject-driven text-to-image generation methods are mainly based on finetuning a pretrained large-scale text-to-image generation model. However, these finetuning methods map the images of the subject into an embedding highly entangled with subject-identity-unrelated information, which may result in the inconsistency between the generated images and the text descriptions and the changes in the subject identity. To tackle the problem, we propose DisenBooth, a disentangled parameter-efficient tuning framework for subject-driven text-to-image generation. DisenBooth enables generating new images that simultaneously preserve the subject identity and conform to the text descriptions, by disentangling the embedding into an identity-related and an identity-unrelated part. Specifically, DisenBooth is based on the pretrained diffusion models and conducts finetuning in the diffusion denoising process, where a shared identity embedding and an image-specific identity-unrelated embedding are utilized jointly for denoising each image. To make the two embeddings disentangled, two auxiliary objectives are proposed. Additionally, to improve the finetuning efficiency, a parameter-efficient finetuning strategy is adopted. Extensive experiments show that our DisenBooth can faithfully learn well-disentangled identity-related and identity-unrelated embeddings. With the shared identity embedding, DisenBooth demonstrates superior subject-driven text-to-image generation ability. Additionally, DisenBooth provides a more flexible and controllable framework with different combinations of the disentangled embeddings.

READ FULL TEXT

page 2

page 4

page 7

page 8

page 9

page 10

research
05/17/2023

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Diffusion models excel at text-to-image generation, especially in subjec...
research
06/21/2023

TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models

In this work, we developed a novel text-guided image synthesis technique...
research
09/11/2023

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Personalized text-to-image generation has emerged as a powerful and soug...
research
04/14/2023

Identity Encoder for Personalized Diffusion

Many applications can benefit from personalized image generation models,...
research
06/12/2023

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Large text-to-image diffusion models have impressive capabilities in gen...
research
06/01/2023

Inserting Anybody in Diffusion Models via Celeb Basis

Exquisite demand exists for customizing the pretrained large text-to-ima...
research
06/22/2023

DreamEdit: Subject-driven Image Editing

Subject-driven image generation aims at generating images containing cus...

Please sign up or login with your details

Forgot password? Click here to reset