SUOD: Toward Scalable Unsupervised Outlier Detection

02/08/2020
by   Yue Zhao, et al.
0

Outlier detection is a key field of machine learning for identifying abnormal data objects. Due to the high expense of acquiring ground truth, unsupervised models are often chosen in practice. To compensate for the unstable nature of unsupervised algorithms, practitioners from high-stakes fields like finance, health, and security, prefer to build a large number of models for further combination and analysis. However, this poses scalability challenges in high-dimensional large datasets. In this study, we propose a three-module acceleration framework called SUOD to expedite the training and prediction with a large number of unsupervised detection models. SUOD's Random Projection module can generate lower subspaces for high-dimensional datasets while reserving their distance relationship. Balanced Parallel Scheduling module can forecast the training and prediction cost of models with high confidence—so the task scheduler could assign nearly equal amount of taskload among workers for efficient parallelization. SUOD also comes with a Pseudo-supervised Approximation module, which can approximate fitted unsupervised models by lower time complexity supervised regressors for fast prediction on unseen data. It may be considered as an unsupervised model knowledge distillation process. Notably, all three modules are independent with great flexibility to "mix and match"; a combination of modules can be chosen based on use cases. Extensive experiments on more than 30 benchmark datasets have shown the efficacy of SUOD, and a comprehensive future development plan is also presented.

READ FULL TEXT

page 4

page 7

research
09/09/2019

Outlier Detection in High Dimensional Data

High-dimensional data poses unique challenges in outlier detection proce...
research
03/03/2021

Detecting Outliers in High-dimensional Data with Mixed Variable Types using Conditional Gaussian Regression Models

Outlier detection has gained increasing interest in recent years, due to...
research
05/02/2023

Outlier galaxy images in the Dark Energy Survey and their identification with unsupervised machine learning

The Dark Energy Survey is able to collect image data of an extremely lar...
research
02/16/2015

Random Subspace Learning Approach to High-Dimensional Outliers Detection

We introduce and develop a novel approach to outlier detection based on ...
research
10/18/2021

Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach

Distance-based outlier detection is widely adopted in many fields, e.g.,...
research
06/02/2022

Sparx: Distributed Outlier Detection at Scale

There is no shortage of outlier detection (OD) algorithms in the literat...
research
09/20/2022

Unsupervised Early Exit in DNNs with Multiple Exits

Deep Neural Networks (DNNs) are generally designed as sequentially casca...

Please sign up or login with your details

Forgot password? Click here to reset