M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

06/07/2023
by   Lei Li, et al.
0

Instruction tuning has significantly advanced large language models (LLMs) such as ChatGPT, enabling them to align with human instructions across diverse tasks. However, progress in open vision-language models (VLMs) has been limited due to the scarcity of high-quality instruction datasets. To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M^3IT) dataset, designed to optimize VLM alignment with human instructions. Our M^3IT dataset comprises 40 carefully curated datasets, including 2.4 million instances and 400 manually written task instructions, reformatted into a vision-to-text structure. Key tasks are translated into 80 languages with an advanced translation system, ensuring broader accessibility. M^3IT surpasses previous datasets regarding task coverage, instruction number and instance scale. Moreover, we develop Ying-VLM, a VLM model trained on our M^3IT dataset, showcasing its potential to answer complex questions requiring world knowledge, generalize to unseen video tasks, and comprehend unseen instructions in Chinese. We have open-sourced the dataset to encourage further research.

READ FULL TEXT

page 4

page 5

page 13

research
12/21/2022

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Instruction tuning, a new learning paradigm that fine-tunes pre-trained ...
research
06/26/2023

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Despite the promising progress in multi-modal tasks, current large multi...
research
05/24/2023

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4,...
research
04/17/2023

LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction

Instruction tuning enables language models to generalize more effectivel...
research
06/11/2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Large language models have become a potential pathway toward achieving a...
research
05/24/2023

PIVOINE: Instruction Tuning for Open-world Information Extraction

We consider the problem of Open-world Information Extraction (Open-world...
research
06/24/2023

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

Large-scale datasets are essential to modern day deep learning. Advocate...

Please sign up or login with your details

Forgot password? Click here to reset