Computing the Difference of Conjunctive Queries Efficiently

by   Xiao Hu, et al.

We investigate how to efficiently compute the difference result of two (or multiple) conjunctive queries, which is the last operator in relational algebra to be unraveled. The standard approach in practical database systems is to materialize the results for every input query as a separate set, and then compute the difference of two (or multiple) sets. This approach is bottlenecked by the complexity of evaluating every input query individually, which could be very expensive, particularly when there are only a few results in the difference. In this paper, we introduce a new approach by exploiting the structural property of input queries and rewriting the original query by pushing the difference operator down as much as possible. We show that for a large class of difference queries, this approach can lead to a linear-time algorithm, in terms of the input size and (final) output size, i.e., the number of query results that survive from the difference operator. We complete this result by showing the hardness of computing the remaining difference queries in linear time. Although a linear-time algorithm is hard to achieve in general, we also provide some heuristics that can provably improve the standard approach. At last, we compare our approach with standard SQL engines over graph and benchmark datasets. The experiment results demonstrate order-of-magnitude speedups achieved by our approach over the vanilla SQL.


page 2

page 12


Relational Playground: Teaching the Duality of Relational Algebra and SQL

Students in introductory data management courses are often taught how to...

Shrinkwrap: Differentially-Private Query Processing in Private Data Federations

A private data federation is a set of autonomous databases that share a ...

In-database connected component analysis

We describe a Big Data-practical, SQL-implementable algorithm for effici...

Efficient Evaluation of Arbitrary Relational Calculus Queries

The relational calculus (RC) is a concise, declarative query language. H...

Computing FO-Rewritings in EL in Practice: from Atomic to Conjunctive Queries

A prominent approach to implementing ontology-mediated queries (OMQs) is...

Parameterized Algorithms for Scalable Interprocedural Data-flow Analysis

Data-flow analysis is a general technique used to compute information of...

Order of Magnitude Comparisons of Distance

Order of magnitude reasoning - reasoning by rough comparisons of the siz...

Please sign up or login with your details

Forgot password? Click here to reset