Tree-wise Distribution Sensitive hashing: Efficient Maximum likelihood Classification by joint dimensionality reduction in known probabilistic settings

05/11/2019
by   Arash Gholami Davoodi, et al.
0

We consider the problem of maximum likelihood classification of a high dimensional data point y to billions of classes x_1,...,x_N, where the conditional probability p(y|x) is known. In the most general case, the complexity of the brute-force method for this classification grows linearly, O(N), with the number of classes N. Efficient multiclass classification methods have been introduced to solve this problem with logarithmic complexity. However, these methods suffer from the curse of dimensionality, i.e., in large dimensions their complexity approaches O(N) per query data point. In the special case where the conditional probability distribution p(y|x) is a Gaussian centered at x, i.e., p(y|x) ∝ N (x,σ), the maximum likelihood classification reduces to the nearest neighbor search with the Euclidean norm. Sublinear methods based on locality sensitive hashing (LSH) have been introduced to solve an approximate version of the nearest neighbor search for high dimensional data. Inspired by these advances, here we introduce distribution sensitive hashing (DSH) to solve an approximate version of the maximum likelihood classification problem through joint dimensionality reduction. In the case of discrete probability distributions, we design TreeDSH, a universal family of distribution sensitive hashes based on the decision trees, and show that their complexity grow sub-linearly. Theory and simulation presented in this paper demonstrate that TreeDSH is more efficient than LSH-hamming and Min-Hashing schemes. Finally, we apply TreeDSH to the problem of peptide identification from mass spectrometry data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2018

Confirmation Sampling for Exact Nearest Neighbor Search

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STO...
research
12/15/2019

Drawbacks and Proposed Solutions for Real-time Processing on Existing State-of-the-art Locality Sensitive Hashing Techniques

Nearest-neighbor query processing is a fundamental operation for many im...
research
04/10/2020

Supervised Autoencoders Learn Robust Joint Factor Models of Neural Activity

Factor models are routinely used for dimensionality reduction in modelin...
research
12/13/2020

Process monitoring based on orthogonal locality preserving projection with maximum likelihood estimation

By integrating two powerful methods of density reduction and intrinsic d...
research
11/20/2014

Maximum Likelihood Directed Enumeration Method in Piecewise-Regular Object Recognition

We explore the problems of classification of composite object (images, s...
research
04/11/2020

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Locality-Sensitive Hashing (LSH) is one of the most popular methods for ...
research
12/22/2017

Lattice-based Locality Sensitive Hashing is Optimal

Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (ST...

Please sign up or login with your details

Forgot password? Click here to reset