CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering

09/26/2019
by   Xiangrui Zeng, et al.
0

Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering structure. Existing feature selection methods, such as Sparse K-means, rarely tackle the problem of accounting features that can only separate a subset of clusters. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes or distinguish a pair of subtypes but not others. In this paper, we propose a K-means based clustering algorithm that discovers informative features as well as which cluster pairs are separable by each selected features. The method is essentially an EM algorithm, in which we introduce lasso-type constraints on each cluster pair in the M step, and make the E step possible by maximizing the raw cross-cluster distance instead of minimizing the intra-cluster distance. The results were demonstrated on simulated data and a leukemia gene expression dataset.

READ FULL TEXT

page 6

page 7

research
03/24/2019

A Strongly Consistent Sparse k-means Clustering with Direct l_1 Penalization on Variable Weights

We propose the Lasso Weighted k-means (LW-k-means) algorithm as a simple...
research
09/04/2019

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult ...
research
05/31/2017

Class Specific Feature Selection for Interval Valued Data Through Interval K-Means Clustering

In this paper, a novel feature selection approach for supervised interva...
research
06/25/2015

CRAFT: ClusteR-specific Assorted Feature selecTion

We present a framework for clustering with cluster-specific feature sele...
research
04/13/2009

KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis

Subspace clustering has gained increasing popularity in the analysis of ...
research
09/21/2021

Classification with Nearest Disjoint Centroids

In this paper, we develop a new classification method based on nearest c...
research
08/24/2023

Powerful Significance Testing for Unbalanced Clusters

Clustering methods are popular for revealing structure in data, particul...

Please sign up or login with your details

Forgot password? Click here to reset