M-VADER: A Model for Diffusion with Multimodal Context

12/06/2022
by   Samuel Weinbach, et al.
9

We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.

READ FULL TEXT

page 9

page 11

page 17

page 18

page 19

page 20

page 21

page 22

research
07/14/2023

GenAssist: Making Image Generation Accessible

Blind and low vision (BLV) creators use images to communicate with sight...
research
08/09/2023

TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design

Text design is one of the most critical procedures in poster design, as ...
research
07/16/2023

Planting a SEED of Vision in Large Language Model

We present SEED, an elaborate image tokenizer that empowers Large Langua...
research
09/26/2022

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

This paper focuses on enhancing the captions generated by image-caption ...
research
10/18/2022

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

Recently, diffusion models have been proven to perform remarkably well i...
research
03/01/2023

Collage Diffusion

Text-conditional diffusion models generate high-quality, diverse images....
research
05/18/2023

AIwriting: Relations Between Image Generation and Digital Writing

During 2022, both transformer-based AI text generation sys-tems such as ...

Please sign up or login with your details

Forgot password? Click here to reset