Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

01/26/2017
by   Mario Boley, et al.
0

Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.

READ FULL TEXT
research
09/22/2017

Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

Subgroup discovery is a local pattern mining technique to find interpret...
research
07/24/2019

Medians in median graphs in linear time

The median of a graph G is the set of all vertices x of G minimizing the...
research
06/18/2020

Median Matrix Completion: from Embarrassment to Optimality

In this paper, we consider matrix completion with absolute deviation los...
research
05/25/2017

Discovering Reliable Approximate Functional Dependencies

Given a database and a target attribute of interest, how can we tell whe...
research
08/30/2019

Discovering Reliable Correlations in Categorical Data

In many scientific tasks we are interested in discovering whether there ...
research
10/01/2019

Confidence intervals for median absolute deviations

The median absolute deviation (MAD) is a robust measure of scale that is...

Please sign up or login with your details

Forgot password? Click here to reset