Investigating the Vision Transformer Model for Image Retrieval Tasks

01/11/2021
by   Socratis Gkelios, et al.
0

This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior initialization or preparation. The description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters. In image retrieval tasks, the use of Handcrafted global and local descriptors has been very successfully replaced, over the last years, by the Convolutional Neural Networks (CNN)-based methods. However, the experimental evaluation conducted in this paper on several benchmarking datasets against 36 state-of-the-art descriptors from the literature demonstrates that a neural network that contains no convolutional layer, such as Vision Transformer, can shape a global descriptor and achieve competitive results. As fine-tuning is not required, the presented methodology's low complexity encourages adoption of the architecture as an image retrieval baseline model, replacing the traditional and well adopted CNN-based approaches and inaugurating a new era in image retrieval approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2019

2-bit Model Compression of Deep Convolutional Neural Network on ASIC Engine for Image Retrieval

Image retrieval utilizes image descriptors to retrieve the most similar ...
research
08/25/2019

A Comparison of CNN and Classic Features for Image Retrieval

Feature detectors and descriptors have been successfully used for variou...
research
08/05/2016

SIFT Meets CNN: A Decade Survey of Instance Retrieval

In the early days, content-based image retrieval (CBIR) was studied with...
research
03/10/2020

A CNN-based Patent Image Retrieval Method for Design Ideation

The patent database is often used in searches of inspirational stimuli f...
research
03/29/2018

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

In this paper we address issues with image retrieval benchmarking on sta...
research
06/15/2019

REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval

This paper addresses the problem of very large-scale image retrieval, fo...
research
01/20/2020

UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision

In this paper, we explore how three related tasks, namely keypoint detec...

Please sign up or login with your details

Forgot password? Click here to reset