Outliers Detection Is Not So Hard: Approximation Algorithms for Robust Clustering Problems Using Local Search Techniques
In this paper, we consider two types of robust models of the k-median/k-means problems: the outlier-version (k-MedO/k-MeaO) and the penalty-version (k-MedP/k-MeaP), in which we can mark some points as outliers and discard them. In k-MedO/k-MeaO, the number of outliers is bounded by a given integer. In k-MedP/k-MeaP, we do not bound the number of outliers, but each outlier will incur a penalty cost. We develop a new technique to analyze the approximation ratio of local search algorithms for these two problems by introducing an adapted cluster that can capture useful information about outliers in the local and the global optimal solution. For k-MeaP, we improve the best known approximation ratio based on local search from 25+ε to 9+ε. For k-MedP, we obtain the best known approximation ratio. For k-MedO/k-MeaO, there exists only two bi-criteria approximation algorithms based on local search. One violates the outlier constraint (the constraint on the number of outliers), while the other violates the cardinality constraint (the constraint on the number of clusters). We consider the former algorithm and improve its approximation ratios from 17+ε to 3+ε for k-MedO, and from 274+ε to 9+ε for k-MeaO.
READ FULL TEXT