A new effective and efficient measure for outlying aspect mining

04/28/2020
by   Durgesh Samariya, et al.
0

Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given dataset. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called SiNNE, which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2017

A simple efficient density estimator that enables fast systematic search

This paper introduces a simple and efficient density estimator that enab...
research
10/28/2015

Universal Dependency Analysis

Most data is multi-dimensional. Discovering whether any subset of dimens...
research
11/13/2020

Efficient Subspace Search in Data Streams

In the real world, data streams are ubiquitous – think of network traffi...
research
05/16/2023

Probabilistic Distance-Based Outlier Detection

The scores of distance-based outlier detection methods are difficult to ...
research
10/01/2013

Joint Bayesian estimation of close subspaces from noisy measurements

In this letter, we consider two sets of observations defined as subspace...
research
07/18/2022

Outlier Explanation via Sum-Product Networks

Outlier explanation is the task of identifying a set of features that di...
research
04/10/2013

A New Approach To Two-View Motion Segmentation Using Global Dimension Minimization

We present a new approach to rigid-body motion segmentation from two vie...

Please sign up or login with your details

Forgot password? Click here to reset