Energy Clustering

10/26/2017
by   Guilherme França, et al.
0

Energy statistics was proposed by Székely in the 80's inspired by the Newtonian gravitational potential from classical mechanics, and it provides a hypothesis test for equality of distributions. It was further generalized from Euclidean spaces to metric spaces of strong negative type, and more recently, a connection with reproducing kernel Hilbert spaces (RKHS) was established. Here we consider the clustering problem from an energy statistics theory perspective, providing a precise mathematical formulation yielding a quadratically constrained quadratic program (QCQP) in the associated RKHS, thus establishing the connection with kernel methods. We show that this QCQP is equivalent to kernel k-means optimization problem once the kernel is fixed. These results imply a first principles derivation of kernel k-means from energy statistics. However, energy statistics fixes a family of standard kernels. Furthermore, we also consider a weighted version of energy statistics, making connection to graph partitioning problems. To find local optimizers of such QCQP we propose an iterative algorithm based on Hartigan's method, which in this case has the same computational cost as kernel k-means algorithm, based on Lloyd's heuristic, but usually with better clustering quality. We provide carefully designed numerical experiments showing the superiority of the proposed method compared to kernel k-means, spectral clustering, standard k-means, and Gaussian mixture models in a variety of settings.

READ FULL TEXT
research
07/25/2012

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

We provide a unifying framework linking two classes of statistics used i...
research
05/02/2012

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

We provide a unifying framework linking two classes of statistics used i...
research
03/18/2022

Hypergraph Modeling via Spectral Embedding Connection: Hypergraph Cut, Weighted Kernel k-means, and Heat Kernel

We propose a theoretical framework of multi-way similarity to model real...
research
10/18/2021

Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models

Despite the ubiquity of kernel-based clustering, surprisingly few statis...
research
10/06/2021

Coresets for Kernel Clustering

We devise the first coreset for kernel k-Means, and use it to obtain new...
research
02/23/2017

Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data

In this paper, we propose PCKID, a novel, robust, kernel function for sp...
research
06/22/2022

Bregman Power k-Means for Clustering Exponential Family Data

Recent progress in center-based clustering algorithms combats poor local...

Please sign up or login with your details

Forgot password? Click here to reset