A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

07/24/2023
by   Jindong Gu, et al.
0

Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

A Survey of Knowledge-Enhanced Pre-trained Language Models

Pre-trained Language Models (PLMs) which are trained on large text corpu...
research
03/12/2023

Diffusion Models for Non-autoregressive Text Generation: A Survey

Non-autoregressive (NAR) text generation has attracted much attention in...
research
08/30/2021

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Large-scale pre-trained language models have contributed significantly t...
research
06/14/2023

Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models

The AI community has been pursuing algorithms known as artificial genera...
research
09/06/2021

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

Multimodal abstractive summarization (MAS) models that summarize videos ...
research
05/15/2023

Sensitivity and Robustness of Large Language Models to Prompt in Japanese

Prompt Engineering has gained significant relevance in recent years, fue...
research
08/20/2023

Activation Addition: Steering Language Models Without Optimization

Reliably controlling the behavior of large language models (LLMs) is a p...

Please sign up or login with your details

Forgot password? Click here to reset