MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

11/16/2022
by   Tianhong Li, et al.
0

Generative modeling and representation learning are two key tasks in computer vision. However, these models are typically trained independently, which ignores the potential for each task to help the other, and leads to training and model maintenance overheads. In this work, we propose MAsked Generative Encoder (MAGE), the first framework to unify SOTA image generation and self-supervised representation learning. Our key insight is that using variable masking ratios in masked image modeling pre-training can allow generative training (very high masking ratio) and representation learning (lower masking ratio) under the same training framework. Inspired by previous generative models, MAGE uses semantic tokens learned by a vector-quantized GAN at inputs and outputs, combining this with masking. We can further improve the representation by adding a contrastive loss to the encoder output. We extensively evaluate the generation and representation learning capabilities of MAGE. On ImageNet-1K, a single MAGE ViT-L model obtains 9.10 FID in the task of class-unconditional image generation and 78.9 probing, achieving state-of-the-art performance in both image generation and representation learning. Code is available at https://github.com/LTH14/mage.

READ FULL TEXT

page 2

page 12

page 17

page 18

page 19

page 20

page 21

page 22

research
07/04/2019

Large Scale Adversarial Representation Learning

Adversarially trained generative models (GANs) have recently achieved co...
research
10/16/2022

Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Though offering amazing contextualized token-level representations, curr...
research
12/06/2022

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

Vector-Quantized (VQ-based) generative models usually consist of two bas...
research
06/10/2021

Learning to See by Looking at Noise

Current vision systems are trained on huge datasets, and these datasets ...
research
06/08/2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

Image recognition and generation have long been developed independently ...
research
03/13/2023

Unsupervised Representation Learning in Partially Observable Atari Games

State representation learning aims to capture latent factors of an envir...
research
01/24/2021

A Joint Representation Learning and Feature Modeling Approach for One-class Recognition

One-class recognition is traditionally approached either as a representa...

Please sign up or login with your details

Forgot password? Click here to reset