Aligning Text-to-Image Models using Human Feedback

02/23/2023
by   Kimin Lee, et al.
1

Deep generative models have shown impressive results in text-to-image synthesis. However, current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models using human feedback, comprising three stages. First, we collect human feedback assessing model output alignment from a set of diverse text prompts. We then use the human-labeled image-text dataset to train a reward function that predicts human feedback. Lastly, the text-to-image model is fine-tuned by maximizing reward-weighted likelihood to improve image-text alignment. Our method generates objects with specified colors, counts and backgrounds more accurately than the pre-trained model. We also analyze several design choices and find that careful investigations on such design choices are important in balancing the alignment-fidelity tradeoffs. Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

READ FULL TEXT

page 2

page 5

page 7

page 12

page 13

page 14

page 15

page 16

research
05/25/2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Learning from human feedback has been shown to improve text-to-image mod...
research
06/07/2023

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Foundation models are first pre-trained on vast unsupervised datasets an...
research
06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...
research
08/03/2023

DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Recent data-driven image colorization methods have enabled automatic or ...
research
08/01/2023

Domain Adaptation based on Human Feedback for Enhancing Generative Model Denoising Abilities

How can we apply human feedback into generative model? As answer of this...
research
03/25/2023

Better Aligning Text-to-Image Models with Human Preference

Recent years have witnessed a rapid growth of deep generative models, wi...
research
01/28/2023

Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

It has been shown that accurate representation in media improves the wel...

Please sign up or login with your details

Forgot password? Click here to reset