SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

12/11/2021
by   Yuzhe Lu, et al.
0

Learning from set-structured data is an essential problem with many applications in machine learning and computer vision. This paper focuses on non-parametric and data-independent learning from set-structured data using approximate nearest neighbor (ANN) solutions, particularly locality-sensitive hashing. We consider the problem of set retrieval from an input set query. Such retrieval problem requires: 1) an efficient mechanism to calculate the distances/dissimilarities between sets, and 2) an appropriate data structure for fast nearest neighbor search. To that end, we propose Sliced-Wasserstein set embedding as a computationally efficient "set-2-vector" mechanism that enables downstream ANN, with theoretical guarantees. The set elements are treated as samples from an unknown underlying distribution, and the Sliced-Wasserstein distance is used to compare sets. We demonstrate the effectiveness of our algorithm, denoted as Set-LOcality Sensitive Hashing (SLOSH), on various set retrieval datasets and compare our proposed embedding with standard set embedding approaches, including Generalized Mean (GeM) embedding/pooling, Featurewise Sort Pooling (FSPool), and Covariance Pooling and show consistent improvement in retrieval results. The code for replicating our results is available here: \href{https://github.com/mint-vu/SLOSH}{https://github.com/mint-vu/SLOSH}.

READ FULL TEXT

page 5

page 7

research
06/03/2022

Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search

We present Falconn++, a novel locality-sensitive filtering (LSF) approac...
research
12/15/2019

Drawbacks and Proposed Solutions for Real-time Processing on Existing State-of-the-art Locality Sensitive Hashing Techniques

Nearest-neighbor query processing is a fundamental operation for many im...
research
11/03/2020

Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation

The reverse k-nearest neighbor (RkNN) query is an established query type...
research
01/07/2023

Why do Nearest Neighbor Language Models Work?

Language models (LMs) compute the probability of a text by sequentially ...
research
12/22/2016

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

Approximate Nearest Neighbor (ANN) search is a fundamental problem in ma...
research
11/16/2014

Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval

We present a simple but powerful reinterpretation of kernelized locality...
research
03/08/2017

Leveraging Sparsity for Efficient Submodular Data Summarization

The facility location problem is widely used for summarizing large datas...

Please sign up or login with your details

Forgot password? Click here to reset