Empirical complexity of comparator-based nearest neighbor descent
A Java parallel streams implementation of the K-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set S of n objects of type V, and a Function<V, Comparator<V>>, which enables any x ∈ S to decide which of y, z ∈ S∖{x} is more similar to x. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of K-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree K digraph on n vertices. Overall complexity was O(n K^2 log_K(n)) in the class of examples studied. When objects are sampled uniformly from a d-dimensional simplex, accuracy of the K-nearest neighbor approximation is high up to d = 20, but declines in higher dimensions, as theory would predict.
READ FULL TEXT