Towards Practical Plug-and-Play Diffusion Models

by   Hyojun Go, et al.

Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner.


page 4

page 8

page 16

page 17

page 18

page 19

page 20

page 21


SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Diffusion models have achieved remarkable success in text-to-image gener...

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models

With the rise of large, publicly-available text-to-image diffusion model...

Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance

Score-based Generative Models (SGMs) are a popular family of deep genera...

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

In recent years, generative models have undergone significant advancemen...

Self-Guided Diffusion Models

Diffusion models have demonstrated remarkable progress in image generati...

Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images

Text-to-image generative models have made remarkable advancements in gen...

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

BEV perception is of great importance in the field of autonomous driving...

Please sign up or login with your details

Forgot password? Click here to reset