Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

06/14/2023
by   Gregor Geigle, et al.
0

Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. The bulk of the evaluation of these models is, however, performed with English text only: the costly creation of language-specific image-caption datasets has limited multilingual VL benchmarks to a handful of high-resource languages. In this work, we introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of 1000 ImageNet labels to 92 languages, built without resorting to machine translation (MT) or requiring manual annotation. We instead automatically obtain reliable translations of ImageNext concepts by linking them – via shared WordNet synsets – to BabelNet, a massively multilingual lexico-semantic network. We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e.g., German or Chinese), and an even bigger gap for low-resource languages (e.g., Sinhala or Lao). Crucially, we show that the models' ZS-IC performance on Babel-ImageNet highly correlates with their performance in image-text retrieval, validating that Babel-ImageNet is suitable for estimating the quality of the multilingual VL representation spaces for the vast majority of languages that lack gold image-text data. Finally, we show that the performance of multilingual CLIP for low-resource languages can be drastically improved via cheap, parameter-efficient language-specific training. We make our code and data publicly available: <https://github.com/gregor-ge/Babel-ImageNet>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

In recent years, multilingual machine translation models have achieved p...
research
09/14/2023

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

Despite the progress we have recorded in the last few years in multiling...
research
10/31/2022

TaTa: A Multilingual Table-to-Text Dataset for African Languages

Existing data-to-text generation datasets are mostly limited to English....
research
06/24/2023

UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation

We describe the systems of the University of Alberta team for the SemEva...
research
09/14/2021

BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens

Intrinsic evaluations of OIE systems are carried out either manually – w...
research
12/08/2021

ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing

Scientific documents often contain a large number of acronyms. Disambigu...
research
09/04/2023

NLLB-CLIP – train performant multilingual image retrieval model on a budget

Today, the exponential rise of large models developed by academic and in...

Please sign up or login with your details

Forgot password? Click here to reset