Exploring Transformer Backbones for Image Diffusion Models

12/27/2022
by   Princy Chahal, et al.
0

We present an end-to-end Transformer based Latent Diffusion model for image synthesis. On the ImageNet class conditioned generation task we show that a Transformer based Latent Diffusion model achieves a 14.1FID which is comparable to the 13.1FID score of a UNet based architecture. In addition to showing the application of Transformer models for Diffusion based image synthesis this simplification in architecture allows easy fusion and modeling of text and image data. The multi-head attention mechanism of Transformers enables simplified interaction between the image and text features which removes the requirement for crossattention mechanism in UNet based Diffusion models.

READ FULL TEXT

page 2

page 5

research
12/19/2022

Scalable Diffusion Models with Transformers

We explore a new class of diffusion models based on the transformer arch...
research
03/25/2023

Masked Diffusion Transformer is a Strong Image Synthesizer

Despite its success in image synthesis, we observe that diffusion probab...
research
07/05/2022

Array Camera Image Fusion using Physics-Aware Transformers

We demonstrate a physics-aware transformer for feature-based data fusion...
research
05/22/2023

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

Deep learning has led to considerable advances in text-to-speech synthes...
research
06/01/2022

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process ...
research
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...

Please sign up or login with your details

Forgot password? Click here to reset