NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT

by   Rojeh Hayek, et al.

SQL queries, with the AND, OR, and NOT operators, constitute a broad class of highly used queries. Thus, their cardinality estimation is important for query optimization. In addition, a query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning; for example, when considering sorting options. Yet, despite the importance of estimating query cardinalities in the presence of DISTINCT, AND, OR, and NOT, many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted. The focus of this work is on two methods for handling this deficiency that can be applied to any limited cardinality estimation model. First, we describe a specialized deep learning scheme, PUNQ, which is tailored to representing conjunctive SQL queries and predicting the percentage of unique rows in the query's result with duplicate rows. Using the predicted percentages obtained via PUNQ, we are able to transform any cardinality estimation method that only estimates for conjunctive queries, and which estimates cardinalities with duplicates (e.g., MSCN), to a method that estimates queries cardinalities without duplicates. This enables estimating cardinalities of queries with the DISTINCT keyword. In addition, we describe a recursive algorithm, GenCrd, for extending any cardinality estimation method M that only handles conjunctive queries to one that estimates cardinalities for more general queries (that include AND, OR, and NOT), without changing the method M itself. Our evaluation is carried out on a challenging, real-world database with general queries that include either the DISTINCT keyword or the AND, OR, and NOT operators. Experimentally, we show that the proposed methods obtain accurate cardinality estimates with the same level of accuracy as that of the original transformed methods.


page 1

page 2

page 3

page 4


Improved Cardinality Estimation by Learning Queries Containment Rates

The containment rate of query Q1 in query Q2 over database D is the perc...

Estimating Cardinalities with Deep Sketches

We introduce Deep Sketches, which are compact models of databases that a...

Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation

Estimating the cardinality (i.e., the number of answers) of conjunctive ...

SafeBound: A Practical System for Generating Cardinality Bounds

Recent work has reemphasized the importance of cardinality estimates for...

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

We describe a new deep learning approach to cardinality estimation. MSCN...

Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Cardinality estimation is a fundamental task in database query processin...

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of qu...

Please sign up or login with your details

Forgot password? Click here to reset