Training Diffusion Models with Reinforcement Learning

05/22/2023
by   Kevin Black, et al.
0

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms, which we refer to as denoising diffusion policy optimization (DDPO), that are more effective than alternative reward-weighted likelihood approaches. Empirically, DDPO is able to adapt text-to-image diffusion models to objectives that are difficult to express via prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Finally, we show that DDPO can improve prompt-image alignment using feedback from a vision-language model without the need for additional data collection or human annotation.

READ FULL TEXT

page 2

page 7

page 8

page 9

page 17

page 18

page 19

page 20

research
05/25/2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Learning from human feedback has been shown to improve text-to-image mod...
research
04/14/2023

Towards Controllable Diffusion Models via Reward-Guided Exploration

By formulating data samples' formation as a Markov denoising process, di...
research
03/07/2023

Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles

In this paper, we focus on a novel optimization problem in which the obj...
research
07/19/2023

FABRIC: Personalizing Diffusion Models with Iterative Feedback

In an era where visual content generation is increasingly driven by mach...
research
07/07/2021

Structured Denoising Diffusion Models in Discrete State-Spaces

Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have s...
research
10/11/2022

Markup-to-Image Diffusion Models with Scheduled Sampling

Building on recent advances in image generation, we present a fully data...
research
05/20/2022

Planning with Diffusion for Flexible Behavior Synthesis

Model-based reinforcement learning methods often use learning only for t...

Please sign up or login with your details

Forgot password? Click here to reset