Subspace approximation with outliers

06/30/2020
by   Amit Deshpande, et al.
39

The subspace approximation problem with outliers, for given n points in d dimensions x_1,…, x_n∈ R^d, an integer 1 ≤ k ≤ d, and an outlier parameter 0 ≤α≤ 1, is to find a k-dimensional linear subspace of R^d that minimizes the sum of squared distances to its nearest (1-α)n points. More generally, the ℓ_p subspace approximation problem with outliers minimizes the sum of p-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the (1-α)n inliers in the optimal solution are promised to lie exactly on a k-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal k-dimensional subspace summed over the optimal (1-α)n inliers is at least δ times its squared-error summed over all n points, for some 0 < δ≤ 1 - α. With this assumption, we give an efficient algorithm to find a subset of poly(k/ϵ) log(1/δ) loglog(1/δ) points whose span contains a k-dimensional subspace that gives a multiplicative (1+ϵ)-approximation to the optimal solution. The running time of our algorithm is linear in n and d. Interestingly, our results hold even when the fraction of outliers α is large, as long as the obvious condition 0 < δ≤ 1 - α is satisfied.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset