CiT: Curation in Training for Effective Vision-Language Data

01/05/2023
by   Hu Xu, et al.
0

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford. This paper trades generality for efficiency and presents Curation in Training (CiT), a simple and efficient vision-text learning algorithm that couples a data objective into training. CiT automatically yields quality data to speed-up contrastive image-text training and alleviates the need for an offline data filtering pipeline, allowing broad data sources (including raw image-text pairs from the web). CiT contains two loops: an outer loop curating the training data and an inner loop consuming the curated training data. The text encoder connects the two loops. Given metadata for tasks of interest, e.g., class names, and a large pool of image-text pairs, CiT alternatively selects relevant training data from the pool by measuring the similarity of their text embeddings and embeddings of the metadata. In our experiments, we observe that CiT can speed up training by over an order of magnitude, especially if the raw data size is large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

Nowadays, the research on Large Vision-Language Models (LVLMs) has been ...
research
07/19/2023

Improving Multimodal Datasets with Image Captioning

Massive web datasets play a key role in the success of large vision-lang...
research
09/13/2023

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

Vision and Language Models (VLMs), such as CLIP, have enabled visual rec...
research
05/30/2023

ConES: Concept Embedding Search for Parameter Efficient Tuning Large Vision Language Models

Large pre-trained vision-language models have shown great prominence in ...
research
11/22/2021

RedCaps: web-curated image-text data created by the people, for the people

Large datasets of paired images and text have become increasingly popula...
research
06/06/2023

LLMZip: Lossless Text Compression using Large Language Models

We provide new estimates of an asymptotic upper bound on the entropy of ...
research
09/02/2021

An Empirical Exploration in Quality Filtering of Text Data

While conventional wisdom suggests that more aggressively filtering data...

Please sign up or login with your details

Forgot password? Click here to reset