Active Sampling for Linear Regression Beyond the ℓ_2 Norm

11/09/2021
by   Cameron Musco, et al.
0

We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector b∈ℝ^n and output a near minimizer to min_x∈ℝ^dAx-b, where A∈ℝ^n × d is a design matrix and · is some loss function. For ℓ_p norm regression for any 0<p<∞, we give an algorithm based on Lewis weight sampling that outputs a (1+ϵ) approximate solution using just Õ(d^max(1,p/2)/poly(ϵ)) queries to b. We show that this dependence on d is optimal, up to logarithmic factors. Our result resolves a recent open question of Chen and Dereziński, who gave near optimal bounds for the ℓ_1 norm, and suboptimal bounds for ℓ_p regression with p∈(1,2). We also provide the first total sensitivity upper bound of O(d^max{1,p/2}log^2 n) for loss functions with at most degree p polynomial growth. This improves a recent result of Tukan, Maalouf, and Feldman. By combining this with our techniques for the ℓ_p regression result, we obtain an active regression algorithm making Õ(d^1+max{1,p/2}/poly(ϵ)) queries, answering another open question of Chen and Dereziński. For the important special case of the Huber loss, we further improve our bound to an active sample complexity of Õ(d^(1+√(2))/2/ϵ^c) and a non-active sample complexity of Õ(d^4-2√(2)/ϵ^c), improving a previous d^4 bound for Huber regression due to Clarkson and Woodruff. Our sensitivity bounds have further implications, improving a variety of previous results using sensitivity sampling, including Orlicz norm subspace embeddings and robust subspace approximation. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every ℓ_p norm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2022

Near-Linear Sample Complexity for L_p Polynomial Regression

We study L_p polynomial regression. Given query access to a function f:[...
research
04/05/2023

Optimal Sketching Bounds for Sparse Linear Regression

We study oblivious sketching for k-sparse linear regression under variou...
research
06/01/2023

Sharper Bounds for ℓ_p Sensitivity Sampling

In large scale machine learning, random sampling is a popular way to app...
research
06/15/2018

On the exact minimization of saturated loss functions for robust regression and subspace estimation

This paper deals with robust regression and subspace estimation and more...
research
07/17/2022

Online Lewis Weight Sampling

The seminal work of Cohen and Peng introduced Lewis weight sampling to t...
research
06/15/2020

Nearly Linear Row Sampling Algorithm for Quantile Regression

We give a row sampling algorithm for the quantile loss function with sam...
research
02/09/2022

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

We give an input sparsity time sampling algorithm for spectrally approxi...

Please sign up or login with your details

Forgot password? Click here to reset