Characterizing Transactional Databases for Frequent Itemset Mining

by   Christian Lezcano, et al.

This paper presents a study of the characteristics of transactional databases used in frequent itemset mining. Such characterizations have typically been used to benchmark and understand the data mining algorithms working on these databases. The aim of our study is to give a picture of how diverse and representative these benchmarking databases are, both in general but also in the context of particular empirical studies found in the literature. Our proposed list of metrics contains many of the existing metrics found in the literature, as well as new ones. Our study shows that our list of metrics is able to capture much of the datasets' inner complexity and thus provides a good basis for the characterization of transactional datasets. Finally, we provide a set of representative datasets based on our characterization that may be used as a benchmark safely.


page 1

page 2

page 3

page 4


Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

Mining frequent sequential patterns from sequence databases has been a c...

Using Set Covering to Generate Databases for Holistic Steganalysis

Within an operational framework, covers used by a steganographer are lik...

Boosting Frequent Itemset Mining via Early Stopping Intersections

Mining frequent itemsets from a transaction database has emerged as a fu...

Approximate Network Motif Mining Via Graph Learning

Frequent and structurally related subgraphs, also known as network motif...

An efficient mining scheme for high utility itemsets

Knowledge discovery in databases aims at finding useful information, whi...

Frequent Itemset Mining with Multiple Minimum Supports: a Constraint-based Approach

The problem of discovering frequent itemsets including rare ones has rec...

Cybercasing 2.0: You Get What You Pay For

Under U.S. law, marketing databases exist under almost no legal restrict...

Please sign up or login with your details

Forgot password? Click here to reset