The CLIP Model is Secretly an Image-to-Prompt Converter

05/22/2023
by   Yuxuan Ding, et al.
0

The Stable Diffusion model is a prominent text-to-image generation model that relies on a text prompt as its input, which is encoded using the Contrastive Language-Image Pre-Training (CLIP). However, text prompts have limitations when it comes to incorporating implicit information from reference images. Existing methods have attempted to address this limitation by employing expensive training procedures involving millions of training samples for image-to-image generation. In contrast, this paper demonstrates that the CLIP model, as utilized in Stable Diffusion, inherently possesses the ability to instantaneously convert images into text prompts. Such an image-to-prompt conversion can be achieved by utilizing a linear projection matrix that is calculated in a closed form. Moreover, the paper showcases that this capability can be further enhanced by either utilizing a small amount of similar-domain training data (approximately 100 images) or incorporating several online training steps (around 30 iterations) on the reference images. By leveraging these approaches, the proposed method offers a simple and flexible solution to bridge the gap between images and text prompts. This methodology can be applied to various tasks such as image variation and image editing, facilitating more effective and seamless interaction between images and textual prompts.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 9

page 12

page 14

page 19

research
05/25/2023

ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation

Personalizing generative models offers a way to guide image generation w...
research
11/24/2022

Shifted Diffusion for Text-to-image Generation

We present Corgi, a novel method for text-to-image generation. Corgi is ...
research
09/13/2023

Unbiased Face Synthesis With Diffusion Models: Are We There Yet?

Text-to-image diffusion models have achieved widespread popularity due t...
research
11/21/2022

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning

Large-scale text-to-image generation models have achieved remarkable pro...
research
11/07/2022

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Machine learning models are now able to convert user-written text descri...
research
03/14/2023

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Text-to-image diffusion models often make implicit assumptions about the...
research
02/09/2023

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation

Text-to-image generation models represent the next step of evolution in ...

Please sign up or login with your details

Forgot password? Click here to reset