Do Subsampled Newton Methods Work for High-Dimensional Data?
Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information. However, previous results require Ω (d) samples to approximate Hessians, where d is the dimension of data points, making it less practically feasible for high-dimensional data. The situation is deteriorated when d is comparably as large as the number of data points n, which requires to take the whole dataset into account, making subsampling useless. This paper theoretically justifies the effectiveness of subsampled Newton methods on high dimensional data. Specifically, we prove only Θ(d^γ_ eff) samples are needed in the approximation of Hessian matrices, where d^γ_ eff is the γ-ridge leverage and can be much smaller than d as long as nγ≫ 1. Additionally, we extend this result so that subsampled Newton methods can work for high-dimensional data on both distributed optimization problems and non-smooth regularized problems.
READ FULL TEXT 
  
  
     share
 share