Feature screening for clustering analysis

06/22/2023
by   Changhu Wang, et al.
0

In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important cluster-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2019

Testing the Order of Multivariate Normal Mixture Models

Finite mixtures of multivariate normal distributions have been widely us...
research
07/06/2021

Neural Mixture Models with Expectation-Maximization for End-to-end Deep Clustering

Any clustering algorithm must synchronously learn to model the clusters ...
research
07/06/2012

Mixtures of Shifted Asymmetric Laplace Distributions

A mixture of shifted asymmetric Laplace distributions is introduced and ...
research
10/13/2020

The Kendall Interaction Filter for Variable Interaction Screening in Ultra High Dimensional Classification Problems

Accounting for important interaction effects can improve prediction of m...
research
09/04/2015

EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis

Data clustering has received a lot of attention and numerous methods, al...
research
11/24/2021

Asymptotics for Markov chain mixture detection

Sufficient conditions are provided under which the log-likelihood ratio ...
research
05/05/2020

Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications

We propose a simple yet powerful test statistic to quantify the discrepa...

Please sign up or login with your details

Forgot password? Click here to reset