Exact fixed-radius nearest neighbor search with an application to clustering
Fixed-radius nearest-neighbor search is a common database operation that retrieves all data points within a user-specified distance to a query point. There are efficient approximate nearest neighbor search algorithms that provide fast query responses but they often have a very compute-intensive indexing phase and require parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius nearest neighbor search method that significantly improves over brute force and tree-based methods in terms of index and query time, returns exact results, and requires no parameter tuning. The method exploits a sorting of the data points by their first principal component, thereby facilitating a reduction in query search space. Further speedup is gained from an efficient implementation using high-level Basic Linear Algebra Subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within a clustering algorithm.
READ FULL TEXT