Itemsets for Real-valued Datasets

02/02/2019
by   Nikolaj Tatti, et al.
0

Pattern mining is one of the most well-studied subfields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank itemsets efficiently from binary data, there is surprisingly little research done in mining patterns from real-valued data. In this paper we propose a family of quality scores for real-valued itemsets. We approach the problem by considering casting the dataset into a binary data and computing the support from this data. This naive approach requires us to select thresholds. To remedy this, instead of selecting one set of thresholds, we treat thresholds as random variables and compute the average support. We show that we can compute this support efficiently, and we also introduce two normalisations, namely comparing the support against the independence assumption and, more generally, against the partition assumption. Our experimental evaluation demonstrates that we can discover statistically significant patterns efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2021

A Simple Necessary Condition For Independence of Real-Valued Random Variables

The standard method to check for the independence of two real-valued ran...
research
05/03/2019

A Constructive Proof of a Concentration Bound for Real-Valued Random Variables

Almost 10 years ago, Impagliazzo and Kabanets (2010) gave a new combinat...
research
10/12/2017

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Deriving insights from high-dimensional data is one of the core problems...
research
03/14/2018

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping

This paper presents a new method, which we call SUSTain, that extends re...
research
12/22/2022

Real-valued affine automata compute beyond Turing machines

We show that bounded-error affine finite automata recognize uncountably ...
research
12/13/2015

Learning the Correction for Multi-Path Deviations in Time-of-Flight Cameras

The Multipath effect in Time-of-Flight(ToF) cameras still remains to be ...
research
12/29/2013

Probabilistic Archetypal Analysis

Archetypal analysis represents a set of observations as convex combinati...

Please sign up or login with your details

Forgot password? Click here to reset