Categorical anomaly detection in heterogeneous data using minimum description length clustering

06/14/2020
by   James Cheney, et al.
0

Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets representing a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2016

Anomaly Detection and Localisation using Mixed Graphical Models

We propose a method that performs anomaly detection and localisation wit...
research
10/19/2018

QANet: Tensor Decomposition Approach for Query-based Anomaly Detection in Heterogeneous Information Networks

Complex networks have now become integral parts of modern information in...
research
05/29/2019

Bayesian Anomaly Detection Using Extreme Value Theory

Data-driven anomaly detection methods typically build a model for the no...
research
04/02/2021

Detecting Anomalies Through Contrast in Heterogeneous Data

Detecting anomalies has been a fundamental approach in detecting potenti...
research
09/25/2020

Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Failures or breakdowns in factory machinery can be costly to companies, ...
research
12/22/2017

Grand Challenge: Optimized Stage Processing for Anomaly Detection on Numerical Data Streams

The 2017 Grand Challenge focused on the problem of automatic detection o...
research
12/13/2021

The whole and the parts: the MDL principle and the a-contrario framework

This work explores the connections between the Minimum Description Lengt...

Please sign up or login with your details

Forgot password? Click here to reset