Robust learning of data anomalies with analytically-solvable entropic outlier sparsification

12/22/2021
by   Illia Horenko, et al.
0

Entropic Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies in a broad class of learning methods, including the unsupervised problems (like detection of non-Gaussian outliers in mostly-Gaussian data) and in the supervised learning with mislabeled data. EOS dwells on the derived analytic closed-form solution of the (weighted) expected error minimization problem subject to the Shannon entropy regularization. In contrast to common regularization strategies requiring computational costs that scale polynomial with the data dimension, identified closed-form solution is proven to impose additional iteration costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically-symmetric Gaussians - used heuristically in many popular data analysis algorithms - represent an optimal choice for the non-parametric probability distributions when working with squared Euclidean distances, combining expected error minimality, maximal entropy/unbiasedness, and a linear cost scaling. The performance of EOS is compared to a range of commonly-used tools on synthetic problems and on partially-mislabeled supervised classification problems from biomedicine.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...
research
06/26/2019

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

We study two problems in high-dimensional robust statistics: robust mean...
research
03/07/2020

RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

Outlier detection is an important task in data mining and many technolog...
research
02/19/2020

Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances

Distances between probability distributions that take into account the g...
research
06/08/2017

Estimating Mixture Entropy with Pairwise Distances

Mixture distributions arise in many parametric and non-parametric settin...
research
06/12/2023

Analysis of the Relative Entropy Asymmetry in the Regularization of Empirical Risk Minimization

The effect of the relative entropy asymmetry is analyzed in the empirica...

Please sign up or login with your details

Forgot password? Click here to reset