A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space

02/12/2020
by   Shahin Shahrampour, et al.
0

In supervised learning using kernel methods, we encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space(RKHS). Often times large-scale finite-sum problems can be solved using efficient variants of Newton's method where the Hessian is approximated via sub-samples. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton's method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global convergence in the high probability sense. The key to our analysis is showing that the approximated Hessian via random features preserves the spectrum of the original Hessian. We provide numerical experiments verifying the efficiency of our approach, compared to variants of sub-sampling methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2016

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and...
research
10/16/2021

Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization

The quasi-Newton methods generally provide curvature information by appr...
research
05/27/2020

PNKH-B: A Projected Newton-Krylov Method for Large-Scale Bound-Constrained Optimization

We present PNKH-B, a projected Newton-Krylov method with a low-rank appr...
research
08/23/2017

Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information

We consider variants of trust-region and cubic regularization methods fo...
research
01/18/2016

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization p...
research
02/11/2022

Measuring dissimilarity with diffeomorphism invariance

Measures of similarity (or dissimilarity) are a key ingredient to many m...
research
10/11/2019

A General Scoring Rule for Randomized Kernel Approximation with Application to Canonical Correlation Analysis

Random features has been widely used for kernel approximation in large-s...

Please sign up or login with your details

Forgot password? Click here to reset