IRGen: Generative Modeling for Image Retrieval

by   Yidan Zhang, et al.

While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.


page 15

page 16


The Fashion IQ Dataset: Retrieving Images by Combining Side Information and Relative Natural Language Feedback

We contribute a new dataset and a novel method for natural language base...

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper explores the task of interactive image retrieval using natura...

Optimizing Rank-based Metrics with Blackbox Differentiation

Rank-based metrics are some of the most widely used criteria for perform...

Global Features are All You Need for Image Retrieval and Reranking

Utilizing a two-stage paradigm comprising of coarse image retrieval and ...

Finding Point with Image: An End-to-End Benchmark for Vision-based UAV Localization

In the past, image retrieval was the mainstream solution for cross-view ...

Genetic Algorithms for the Optimization of Diffusion Parameters in Content-Based Image Retrieval

Several computer vision and artificial intelligence projects are nowaday...

Please sign up or login with your details

Forgot password? Click here to reset