Painter: Teaching Auto-regressive Language Models to Draw Sketches

08/16/2023
by   Reza Pourreza, et al.
0

Large language models (LLMs) have made tremendous progress in natural language understanding and they have also been successfully adopted in other domains such as computer vision, robotics, reinforcement learning, etc. In this work, we apply LLMs to image generation tasks by directly generating the virtual brush strokes to paint an image. We present Painter, an LLM that can convert user prompts in text description format to sketches by generating the corresponding brush strokes in an auto-regressive way. We construct Painter based on off-the-shelf LLM that is pre-trained on a large text corpus, by fine-tuning it on the new task while preserving language understanding capabilities. We create a dataset of diverse multi-object sketches paired with textual prompts that covers several object types and tasks. Painter can generate sketches from text descriptions, remove objects from canvas, and detect and classify objects in sketches. Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.

READ FULL TEXT
research
02/13/2023

Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions

Existing language and vision models achieve impressive performance in im...
research
05/09/2023

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Diffusion models, which have emerged to become popular text-to-image gen...
research
11/21/2022

Teaching Structured Vision Language Concepts to Vision Language Models

Vision and Language (VL) models have demonstrated remarkable zero-shot p...
research
05/12/2023

ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

In recent years, large language models (LLMs) have made significant prog...
research
06/09/2023

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Recently, large language models (LLMs) have made significant advancement...
research
02/23/2023

Teaching CLIP to Count to Ten

Large vision-language models (VLMs), such as CLIP, learn rich joint imag...
research
05/29/2023

Controllable Text-to-Image Generation with GPT-4

Current text-to-image generation models often struggle to follow textual...

Please sign up or login with your details

Forgot password? Click here to reset