AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

11/12/2022
by   Zhongzhi Chen, et al.
0

In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model. Starting from the pre-trained multimodal representation model CLIP released by OpenAI, we altered its text encoder with a pre-trained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k-CN, COCO-CN and XTD. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding. Our models and code are available at https://github.com/FlagAI-Open/FlagAI.

READ FULL TEXT

page 5

page 7

research
05/31/2023

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

We present XPhoneBERT, the first multilingual model pre-trained to learn...
research
11/22/2022

X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

Vision language pre-training aims to learn alignments between vision and...
research
10/22/2020

mT5: A massively multilingual pre-trained text-to-text transformer

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified ...
research
11/25/2022

CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels

Pre-trained vision-language models like CLIP have recently shown superio...
research
03/23/2023

SwissBERT: The Multilingual Language Model for Switzerland

We present SwissBERT, a masked language model created specifically for p...
research
08/29/2023

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

There has been a growing interest in developing multimodal machine trans...
research
02/12/2022

Indication as Prior Knowledge for Multimodal Disease Classification in Chest Radiographs with Transformers

When a clinician refers a patient for an imaging exam, they include the ...

Please sign up or login with your details

Forgot password? Click here to reset