MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

12/21/2022
by   Zhiyang Xu, et al.
0

Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, has shown promising zero-shot performance on various natural language processing tasks. However, it's still not explored for vision and multimodal tasks. In this work, we introduce MultiInstruct, the first multimodal instruction tuning benchmark dataset that consists of 47 diverse multimodal tasks covering 11 broad categories. Each task is designed at least with 5,000 instances (input-out pairs) from existing open-source datasets and 5 expert-written instructions. We take OFA as the base pre-trained model for multimodal instruction tuning, and to improve its performance, we explore multiple transfer learning strategies to leverage the large-scale Natural Instructions dataset. Experimental results demonstrate its strong zero-shot performance on various unseen multimodal tasks and the benefit of transfer learning from text-only instructions. We also design a new evaluation metric: Sensitivity, to evaluate how sensitive the model is to the variety of instructions. Our results indicate that the model is less sensitive to the varying instructions after finetuning on a diverse set of tasks and instructions for each task.

READ FULL TEXT

page 2

page 4

research
05/25/2022

Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Instruction tuning is an emergent paradigm in NLP wherein natural langua...
research
06/07/2023

M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

Instruction tuning has significantly advanced large language models (LLM...
research
09/18/2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Text language models have shown remarkable zero-shot capability in gener...
research
04/17/2023

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction

Large language models have unlocked strong multi-task capabilities from ...
research
10/17/2022

Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization

Training language models to learn from human instructions for zero-shot ...
research
04/25/2023

TABLET: Learning From Instructions For Tabular Data

Acquiring high-quality data is often a significant challenge in training...
research
03/14/2022

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

Providing natural language instructions in prompts is a useful new parad...

Please sign up or login with your details

Forgot password? Click here to reset