Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

09/22/2017
by   Janis Kalofolias, et al.
0

Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-k subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2017

Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

Existing algorithms for subgroup discovery with numerical targets do not...
research
01/27/2017

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Discovering the key structure of a database is one of the main goals of ...
research
08/30/2019

Discovering Reliable Correlations in Categorical Data

In many scientific tasks we are interested in discovering whether there ...
research
10/26/2020

Interpretable Assessment of Fairness During Model Evaluation

For companies developing products or algorithms, it is important to unde...
research
06/16/2020

Discovering outstanding subgroup lists for numeric targets using MDL

The task of subgroup discovery (SD) is to find interpretable description...
research
05/20/2020

DisCoveR: Accurate Efficient Discovery of Declarative Process Models

Declarative process modeling formalisms - which capture high-level proce...
research
04/26/2019

Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms

In the field of multi-objective optimization algorithms, multi-objective...

Please sign up or login with your details

Forgot password? Click here to reset