Quantizing Multiple Sources to a Common Cluster Center: An Asymptotic Analysis
We consider quantizing an Ld-dimensional sample, which is obtained by concatenating L vectors from datasets of d-dimensional vectors, to a d-dimensional cluster center. The distortion measure is the weighted sum of rth powers of the distances between the cluster center and the samples. For L=1, one recovers the ordinary center based clustering formulation. The general case L>1 appears when one wishes to cluster a dataset through L noisy observations of each of its members. We find a formula for the average distortion performance in the asymptotic regime where the number of cluster centers are large. We also provide an algorithm to numerically optimize the cluster centers and verify our analytical results on real and artificial datasets. In terms of faithfulness to the original (noiseless) dataset, our clustering approach outperforms the naive approach that relies on quantizing the Ld-dimensional noisy observation vectors to Ld-dimensional centers.
READ FULL TEXT