Efficient Inner Product Approximation in Hybrid Spaces

by   Xiang Wu, et al.

Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are often used in conjunction with sparse textual features to provide high dimensional hybrid representation of documents. Efficient search in such hybrid spaces is very challenging as the techniques that perform well for sparse vectors have little overlap with those that work well for dense vectors. Popular techniques like Locality Sensitive Hashing (LSH) and its data-dependent variants also do not give good accuracy in high dimensional hybrid spaces. Even though hybrid scenarios are becoming more prevalent, currently there exist no efficient techniques in literature that are both fast and accurate. In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy. We also propose efficient data structures that exploit modern computer architectures, resulting in orders of magnitude faster search than the existing baselines. The performance of the proposed method is demonstrated on several datasets including a very large scale industrial dataset containing one billion vectors in a billion dimensional space, achieving over 10x speedup and higher accuracy against competitive baselines.


page 1

page 2

page 3

page 4


Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation

We present a new approach for computing compact sketches that can be use...

Experimental Analysis of Machine Learning Techniques for Finding Search Radius in Locality Sensitive Hashing

Finding similar data in high-dimensional spaces is one of the important ...

Geometry Aware Mappings for High Dimensional Sparse Factors

While matrix factorisation models are ubiquitous in large scale recommen...

Improving Similarity Search with High-dimensional Locality-sensitive Hashing

We propose a new class of data-independent locality-sensitive hashing (L...

Bridging Dense and Sparse Maximum Inner Product Search

Maximum inner product search (MIPS) over dense and sparse vectors have p...

Massively Parallel Graph Drawing and Representation Learning

To fully exploit the performance potential of modern multi-core processo...

Faster Maximum Inner Product Search in High Dimensions

Maximum Inner Product Search (MIPS) is a popular problem in the machine ...

Please sign up or login with your details

Forgot password? Click here to reset