InDiReCT: Language-Guided Zero-Shot Deep Metric Learning for Images

11/23/2022
by   Konstantin Kobs, et al.
0

Common Deep Metric Learning (DML) datasets specify only one notion of similarity, e.g., two images in the Cars196 dataset are deemed similar if they show the same car model. We argue that depending on the application, users of image retrieval systems have different and changing similarity notions that should be incorporated as easily as possible. Therefore, we present Language-Guided Zero-Shot Deep Metric Learning (LanZ-DML) as a new DML setting in which users control the properties that should be important for image representations without training data by only using natural language. To this end, we propose InDiReCT (Image representations using Dimensionality Reduction on CLIP embedded Texts), a model for LanZ-DML on images that exclusively uses a few text prompts for training. InDiReCT utilizes CLIP as a fixed feature extractor for images and texts and transfers the variation in text prompt embeddings to the image embedding space. Extensive experiments on five datasets and overall thirteen similarity notions show that, despite not seeing any images during training, InDiReCT performs better than strong baselines and approaches the performance of fully-supervised models. An analysis reveals that InDiReCT learns to focus on regions of the image that correlate with the desired similarity notion, which makes it a fast to train and easy to use method to create custom embedding spaces only using natural language.

READ FULL TEXT

page 1

page 6

page 18

page 19

research
09/17/2020

S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Deep Metric Learning (DML) provides a crucial tool for visual similarity...
research
07/27/2016

Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification

This paper addresses the task of zero-shot image classification. The key...
research
04/23/2022

On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

Humans are able to identify and categorize novel compositions of known c...
research
04/17/2019

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images

In daily life, graphic symbols, such as traffic signs and brand logos, a...
research
12/11/2017

Deep metric learning for multi-labelled radiographs

Many radiological studies can reveal the presence of several co-existing...
research
03/25/2016

Conditional Similarity Networks

What makes images similar? To measure the similarity between images, the...
research
12/17/2019

A Probabilistic approach for Learning Embeddings without Supervision

For challenging machine learning problems such as zero-shot learning and...

Please sign up or login with your details

Forgot password? Click here to reset