Explaining Aggregates for Exploratory Analytics

12/29/2018
by   Fotis Savva, et al.
0

Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information about the queried subspaces for exploratory analysis. We address this shortcoming aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism coined XAXA: eXplaining Aggregates for eXploratory Analytics. XAXA's novel AQ explanations are represented using functions obtained by a three-fold joint optimization problem. Explanations assume the form of a set of parametric piecewise-linear functions acquired through a statistical learning model. A key feature of the proposed solution is that model training is performed by only monitoring AQs and their answers on-line. In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB. We evaluate the explanation accuracy and efficiency of XAXA through theoretically grounded metrics over real-world and synthetic datasets and query workloads.

READ FULL TEXT
research
02/10/2021

Explaining Inference Queries with Bayesian Optimization

Obtaining an explanation for an SQL query result can enrich the analysis...
research
09/02/2022

DPXPlain: Privately Explaining Aggregate Query Answers

Differential privacy (DP) is the state-of-the-art and rigorous notion of...
research
03/29/2021

Putting Things into Context: Rich Explanations for Query Answers using Join Graphs (extended version)

In many data analysis applications, there is a need to explain why a sur...
research
03/21/2019

Explain3D: Explaining Disagreements in Disjoint Datasets

Data plays an important role in applications, analytic processes, and ma...
research
08/13/2019

Adaptive Learning of Aggregate Analytics under Dynamic Workloads

Large organizations have seamlessly incorporated data-driven decision ma...
research
03/12/2021

To not miss the forest for the trees – a holistic approach for explaining missing answers over nested data (extended version)

Query-based explanations for missing answers identify which operators of...
research
08/21/2019

GeoBlocks: A Query-Driven Storage Layout for Geospatial Data

City authorities need to analyze urban geospatial data to improve transp...

Please sign up or login with your details

Forgot password? Click here to reset