GeneCIS: A Benchmark for General Conditional Image Similarity

06/13/2023
by   Sagar Vaze, et al.
0

We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States. Project page at https://sgvaze.github.io/genecis/.

READ FULL TEXT
research
01/22/2022

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Self-supervision has shown outstanding results for natural language proc...
research
02/01/2023

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

Image geolocalization is the challenging task of predicting the geograph...
research
02/06/2023

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

In Composed Image Retrieval (CIR), a user combines a query image with te...
research
07/31/2018

A Zero-Shot Framework for Sketch-based Image Retrieval

Sketch-based image retrieval (SBIR) is the task of retrieving images fro...
research
11/03/2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

For the majority of the machine learning community, the expensive nature...
research
06/12/2023

Zero-shot Composed Text-Image Retrieval

In this paper, we consider the problem of composed image retrieval (CIR)...
research
12/01/2022

Improving Zero-Shot Models with Label Distribution Priors

Labeling large image datasets with attributes such as facial age or obje...

Please sign up or login with your details

Forgot password? Click here to reset