Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

by   Felipe Llinares-Lopez, et al.
ETH Zurich

We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs.


Abstract Representations and Frequent Pattern Discovery

We discuss the frequent pattern mining problem in a general setting. Fro...

An Efficient and Wear-Leveling-Aware Frequent-Pattern Mining on Non-Volatile Memory

Frequent-pattern mining is a common approach to reveal the valuable hidd...

Searching for significant patterns in stratified data

Significant pattern mining, the problem of finding itemsets that are sig...

Near-optimal Top-k Pattern Mining

Nowadays, frequent pattern mining (FPM) on large graphs receives increas...

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

We present MCRapper, an algorithm for efficient computation of Monte-Car...

Significant Subgraph Mining with Multiple Testing Correction

The problem of finding itemsets that are statistically significantly enr...

Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

This paper introduces the combinatorial Boolean model (CBM), which is de...

Please sign up or login with your details

Forgot password? Click here to reset