Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

03/21/2023
by   Zaid Khan, et al.
0

Contrastive vision-language models (e.g. CLIP) are typically created by updating all the parameters of a vision model and language model through contrastive training. Can such models be created by a small number of parameter updates to an already-trained language model and vision model? The literature describes techniques that can create vision-language models by updating a small number of parameters in a language model, but these require already aligned visual representations and are non-contrastive, hence unusable for latency-sensitive applications such as neural search. We explore the feasibility and benefits of parameter-efficient contrastive vision-language alignment through transfer learning: creating a model such as CLIP by minimally updating an already-trained vision and language model. We find that a minimal set of parameter updates (<7 training, and updating specific components (<1 of full-model training. We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size. Where paired-image text data is scarce but strong multilingual language models exist (e.g. low resource languages), parameter-efficient training is even preferable to full-model training. Given a fixed compute budget, parameter-efficient training allows training larger models on the same hardware, achieving equivalent performance in less time. Parameter-efficient training hence constitutes an energy-efficient and effective training strategy for contrastive vision-language models that may be preferable to the full-model training paradigm for common use cases. Code and weights at https://github.com/codezakh/LilT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

Prismer: A Vision-Language Model with An Ensemble of Experts

Recent vision-language models have shown impressive multi-modal generati...
research
10/12/2022

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Multilingual information retrieval is challenging due to the lack of tra...
research
01/01/2019

Transfer learning from language models to image caption generators: Better models may not transfer better

When designing a neural caption generator, a convolutional neural networ...
research
04/11/2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

We propose Conditional Adapter (CoDA), a parameter-efficient transfer le...
research
05/31/2023

Improving CLIP Training with Language Rewrites

Contrastive Language-Image Pre-training (CLIP) stands as one of the most...
research
04/19/2023

A Theory on Adam Instability in Large-Scale Machine Learning

We present a theory for the previously unexplained divergent behavior no...
research
03/27/2023

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

In this paper, we propose a highly parameter-efficient approach to scali...

Please sign up or login with your details

Forgot password? Click here to reset