CLIPAG: Towards Generator-Free Text-to-Image Generation

06/29/2023
by   Roy Ganz, et al.
0

Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators.

READ FULL TEXT
research
05/17/2023

What You See is What You Read? Improving Text-Image Alignment Evaluation

Automatically determining whether a text and a corresponding image are s...
research
12/23/2022

Do DALL-E and Flamingo Understand Each Other?

A major goal of multimodal research is to improve machine understanding ...
research
06/01/2023

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Recent advances in vision-language pre-training have enabled machines to...
research
09/02/2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Text-to-image generation (TTI) refers to the usage of models that could ...
research
05/08/2017

Generative Cooperative Net for Image Generation and Data Augmentation

How to build a good model for image generation given an abstract concept...
research
05/30/2023

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

One of the remarkable properties of robust computer vision models is tha...
research
05/24/2023

Vision + Language Applications: A Survey

Text-to-image generation has attracted significant interest from researc...

Please sign up or login with your details

Forgot password? Click here to reset