Dynamic Enumeration of Similarity Joins

05/05/2021
by   Pankaj K. Agarwal, et al.
0

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of n points A,B in ℝ^d, a metric ϕ(·), and a distance threshold r > 0, report all pairs of points (a, b) ∈ A × B with ϕ(a,b) ≤ r. Our goal is to store A,B into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from A or B. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for ℓ_1, ℓ_∞ metrics with log^O(1) n update time and delay. We show that such a data structure is not feasible for the ℓ_2 metric for d ≥ 4. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for ℓ_p metric, with log^O(1) n delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

READ FULL TEXT

page 1

page 5

page 7

page 13

page 15

page 17

page 19

page 21

research
03/15/2020

Four-Dimensional Dominance Range Reporting in Linear Space

In this paper we study the four-dimensional dominance range reporting pr...
research
05/25/2019

Robotic bees: Algorithms for collision detection and prevention

In the following paper we will discuss data structures suited for distan...
research
04/09/2018

Set Similarity Search for Skewed Data

Set similarity join, as well as the corresponding indexing problem set s...
research
01/26/2021

Sampling a Near Neighbor in High Dimensions – Who is the Fairest of Them All?

Similarity search is a fundamental algorithmic primitive, widely used in...
research
04/16/2018

Adaptive MapReduce Similarity Joins

Similarity joins are a fundamental database operation. Given data sets S...
research
11/01/2020

A Lower Bound for Dynamic Fractional Cascading

We investigate the limits of one of the fundamental ideas in data struct...
research
12/15/2017

Dynamic smooth compressed quadtrees (Fullversion)

We introduce dynamic smooth (a.k.a. balanced) compressed quadtrees with ...

Please sign up or login with your details

Forgot password? Click here to reset