Technical Report: KNN Joins Using a Hybrid Approach: Exploiting CPU/GPU Workload Characteristics

10/10/2018
by   Michael Gowanlock, et al.
0

This paper studies finding the K nearest neighbors (KNN) of all points in a dataset. Typical solutions to KNN searches use indexing to prune the search, which reduces the number of candidate points that may be within the set of the nearest K points of each query point. In high dimensionality, index searches degrade, making the KNN self-join a prohibitively expensive operation in some scenarios. Furthermore, there are a significant number of distance calculations needed to determine which points are nearest to each query point. To address these challenges, we propose a hybrid CPU/GPU approach. Since the CPU and GPU are considerably different architectures that are best exploited using different algorithms, we advocate for splitting the work between both architectures based on the characteristic workloads defined by the query points in the dataset. As such, we assign dense regions to the GPU, and sparse regions to the CPU to most efficiently exploit the relative strengths of each architecture. Critically, we find that the relative performance gains over the reference implementation across four real-world datasets are a function of the data properties (size, dimensionality, distribution), and number of neighbors, K.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2018

GPU Accelerated Self-join for the Distance Similarity Metric

The self-join finds all objects in a dataset within a threshold of each ...
research
09/26/2018

GPU Accelerated Similarity Self-Join for Multi-Dimensional Data

The self-join finds all objects in a dataset that are within a search di...
research
06/19/2020

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Finding nearest neighbors in high-dimensional spaces is a fundamental op...
research
01/22/2018

Scalable Secure Computation of Statistical Functions with Applications to k-Nearest Neighbors

Given a set S of n d-dimensional points, the k-nearest neighbors (KNN) i...
research
03/01/2017

Fast k-Nearest Neighbour Search via Prioritized DCI

Most exact methods for k-nearest neighbour search suffer from the curse ...
research
03/02/2023

RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing

Data management on GPUs has become increasingly relevant due to a tremen...
research
12/21/2018

Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm

We investigate the problem of exact set similarity joins using a co-proc...

Please sign up or login with your details

Forgot password? Click here to reset