Clustered Hierarchical Entropy-Scaling Search of Astronomical and Biological Data

08/22/2019
by   Najib Ishaq, et al.
0

Both astronomy and biology are experiencing explosive growth of data, resulting in a "big data" problem standing in the way of a "big data" opportunity for discovery. One common task on such data sets is the problem of approximate search, or ρ-nearest neighbors search. We present a hierarchical search algorithm for such data sets that takes advantage of particular geometric properties apparent in both astronomical and biological data sets, namely the metric entropy and fractal dimensionality of the data. We present CHESS (Clustered Hierarchical Entropy-Scaling Search), a GPU-accelerated search tool with no loss in specificity or sensitivity, demonstrating a 6.4× speedup over linear search on the Sloan Digital Sky Survey's APOGEE data set and a 3.97× speedup on the GreenGenes 16S metagenomic data set, as well as asymptotically fewer comparisons on APOGEE when compared to the FALCONN locality-sensitive hashing library. CHESS allows for implicit data compression, which we demonstrate on the APOGEE data set. We also discuss an extension allowing for efficient k-nearest neighbors search.

READ FULL TEXT
research
12/01/2019

Active Search for Nearest Neighbors

In pattern recognition or machine learning, it is a very fundamental tas...
research
08/16/2019

Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees

Here, we introduce a new data visualization and exploration method, TMAP...
research
06/08/2017

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Automating statistical modelling is a challenging problem that has far-r...
research
12/20/2021

Efficient Wind Speed Nowcasting with GPU-Accelerated Nearest Neighbors Algorithm

This paper proposes a simple yet efficient high-altitude wind nowcasting...
research
11/08/2020

Locally Adaptive Nearest Neighbors

When training automated systems, it has been shown to be beneficial to a...
research
07/20/2020

A Hierarchical Approach to Scaling Batch Active Search Over Structured Data

Active search is the process of identifying high-value data points in a ...

Please sign up or login with your details

Forgot password? Click here to reset