A flexible outlier detector based on a topology given by graph communities

02/18/2020
by   O. Ramos Terrades, et al.
0

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2021

LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks

Many well-established anomaly detection methods use the distance of a sa...
research
06/28/2016

A Local Density-Based Approach for Local Outlier Detection

This paper presents a simple but effective density-based outlier detecti...
research
10/02/2018

GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression

We propose an algorithm called GLAD (GLocalized Anomaly Detection) that ...
research
08/12/2021

Clustering with UMAP: Why and How Connectivity Matters

Topology based dimensionality reduction methods such as t-SNE and UMAP h...
research
09/30/2022

TOAST: Topological Algorithm for Singularity Tracking

The manifold hypothesis, which assumes that data lie on or close to an u...
research
09/13/2023

ConR: Contrastive Regularizer for Deep Imbalanced Regression

Imbalanced distributions are ubiquitous in real-world data. They create ...
research
12/02/2021

Joint Characterization of the Cryospheric Spectral Feature Space

Hyperspectral feature spaces are useful for many remote sensing applicat...

Please sign up or login with your details

Forgot password? Click here to reset