Link-Context Learning for Multimodal LLMs

08/15/2023
by   Yan Tai, et al.
0

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to “learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.

READ FULL TEXT

page 1

page 2

page 5

page 7

research
06/23/2023

A Survey on Multimodal Large Language Models

Multimodal Large Language Model (MLLM) recently has been a new rising re...
research
08/01/2022

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

In-context learning refers to the ability of a model to condition on a p...
research
05/19/2023

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Large language models (LLMs), such as ChatGPT, are prone to generate hal...
research
03/10/2022

Conditional Prompt Learning for Vision-Language Models

With the rise of powerful pre-trained vision-language models like CLIP, ...
research
06/08/2021

TIMEDIAL: Temporal Commonsense Reasoning in Dialog

Everyday conversations require understanding everyday events, which in t...
research
03/20/2023

eP-ALM: Efficient Perceptual Augmentation of Language Models

Large Language Models (LLMs) have so far impressed the world, with unpre...
research
06/12/2023

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

The visual classification performance of vision-language models such as ...

Please sign up or login with your details

Forgot password? Click here to reset