Discovering Episodes with Compact Minimal Windows

04/15/2019
by   Nikolaj Tatti, et al.
0

Discovering the most interesting patterns is the key problem in the field of pattern mining. While ranking or selecting patterns is well-studied for itemsets it is surprisingly under-researched for other, more complex, pattern types. In this paper we propose a new quality measure for episodes. An episode is essentially a set of events with possible restrictions on the order of events. We say that an episode is significant if its occurrence is abnormally compact, that is, only few gap events occur between the actual episode events, when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step by first discovering frequent episodes and then rank them according to this measure. In order to compute the score we will need to compute the mean and the variance according to the independence model. As a main technical contribution we introduce a technique that allows us to compute these values. Such a task is surprisingly complex and in order to solve it we develop intricate finite state machines that allow us to compute the needed statistics. We also show that asymptotically our score can be interpreted as a P-value. In our experiments we demonstrate that despite its intricacy our ranking is fast: we can rank tens of thousands episodes in seconds. Our experiments with text data demonstrate that our measure ranks interpretable episodes high.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

Significance of Episodes Based on Minimal Windows

Discovering episodes, frequent sets of events from a sequence has been a...
research
02/04/2019

Ranking Episodes using a Partition Model

One of the biggest setbacks in traditional frequent pattern mining is th...
research
04/16/2019

Mining Closed Episodes with Simultaneous Events

Sequential pattern discovery is a well-studied field in data mining. Epi...
research
02/18/2019

Finding Robust Itemsets Under Subsampling

Mining frequent patterns is plagued by the problem of pattern explosion ...
research
04/14/2019

Mining Closed Strict Episodes

Discovering patterns in a sequence is an important aspect of data mining...
research
02/17/2020

Semantics of negative sequential patterns

In the field of pattern mining, a negative sequential pattern is specifi...
research
05/19/2018

Free-rider Episode Screening via Dual Partition Model

One of the drawbacks of frequent episode mining is that overwhelmingly m...

Please sign up or login with your details

Forgot password? Click here to reset