A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

12/22/2016
by   Deng Cai, et al.
0

Approximate Nearest Neighbor (ANN) search is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperforms other state-of-the-art methods. However, there are serious drawbacks in the evaluation of existing hashing papers and most of the claims in these papers should be re-examined. 1) Most of the existing papers failed to correctly measure the search time which is essential for the ANN search problem. 2) As a result, most of the papers report the performance increases as the code length increases, which is wrong if we measure the search time correctly. 3) The performance of some hashing algorithms (e.g., LSH) can easily be boosted if one uses multiple hash tables, which is an important factor should be considered in the evaluation while most of the papers failed to do so. In this paper, we carefully revisit many popular hashing algorithms and suggest one possible promising direction. For the sake of reproducibility, all the codes used in the paper are released on Github, which can be used as a testing platform to fairly compare various hashing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval

There is a growing trend in studying deep hashing methods for content-ba...
research
07/06/2021

DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

Kernel Density Estimation (KDE) is a nonparametric method for estimating...
research
06/27/2012

On the Difficulty of Nearest Neighbor Search

Fast approximate nearest neighbor (NN) search in large databases is beco...
research
09/18/2017

Beyond SIFT using Binary features for Loop Closure Detection

In this paper a binary feature based Loop Closure Detection (LCD) method...
research
05/27/2019

On the Evaluation Metric for Hashing

Due to its low storage cost and fast query speed, hashing has been widel...
research
12/11/2021

SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

Learning from set-structured data is an essential problem with many appl...
research
04/05/2023

Unfolded Self-Reconstruction LSH: Towards Machine Unlearning in Approximate Nearest Neighbour Search

Approximate nearest neighbour (ANN) search is an essential component of ...

Please sign up or login with your details

Forgot password? Click here to reset