M-VADER: A Model for Diffusion with Multimodal Context

by   Samuel Weinbach, et al.

We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.


page 9

page 11

page 17

page 18

page 19

page 20

page 21

page 22


GenAssist: Making Image Generation Accessible

Blind and low vision (BLV) creators use images to communicate with sight...

TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design

Text design is one of the most critical procedures in poster design, as ...

Planting a SEED of Vision in Large Language Model

We present SEED, an elaborate image tokenizer that empowers Large Langua...

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

This paper focuses on enhancing the captions generated by image-caption ...

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

Recently, diffusion models have been proven to perform remarkably well i...

Collage Diffusion

Text-conditional diffusion models generate high-quality, diverse images....

AIwriting: Relations Between Image Generation and Digital Writing

During 2022, both transformer-based AI text generation sys-tems such as ...

Please sign up or login with your details

Forgot password? Click here to reset