Real Elliptically Skewed Distributions and Their Application to Robust Cluster Analysis

by   Christian A. Schroth, et al.

This article proposes a new class of Real Elliptically Skewed (RESK) distributions and associated clustering algorithms that allow for integrating robustness and skewness into a single unified cluster analysis framework. Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications. Robustness is essential because a few outlying observations can severely obscure the cluster structure. The RESK distributions are a generalization of the Real Elliptically Symmetric (RES) distributions. To estimate the cluster parameters and memberships, we derive an expectation maximization (EM) algorithm for arbitrary RESK distributions. Special attention is given to a new robust skew-Huber M-estimator, which is also the maximum likelihood estimator (MLE) for the skew-Huber distribution that belongs to the RESK class. Numerical experiments on simulated and real-world data confirm the usefulness of the proposed methods for skewed and heavy-tailed data sets.


Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions

Robustly determining the optimal number of clusters in a data set is an ...

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

T-distributed stochastic neighbour embedding (t-SNE) is a widely used da...

Robust Regularized Locality Preserving Indexing for Fiedler Vector Estimation

The Fiedler vector of a connected graph is the eigenvector associated wi...

Robust Bayesian Cluster Enumeration

A major challenge in cluster analysis is that the number of data cluster...

A 2-stage elastic net algorithm for estimation of sparse networks with heavy tailed data

We propose a new 2-stage procedure that relies on the elastic net penalt...

Regularization Methods Based on the L_q-Likelihood for Linear Models with Heavy-Tailed Errors

We propose regularization methods for linear models based on the L_q-lik...

Grouped Heterogeneous Mixture Modeling for Clustered Data

Clustered data which has a grouping structure (e.g. postal area, school,...

Please sign up or login with your details

Forgot password? Click here to reset