Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

11/19/2021
by   Hao Chen, et al.
0

Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on hyperparameter estimation for the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full log-likelihood loss function, and recovers model hyperparameters with rate O(1/K) for K iterations, up to a statistical error term depending on the minibatch size. Our theoretical guarantees hold provided that the kernel functions exhibit exponential or polynomial eigendecay which is satisfied by a wide range of kernels commonly used in GPs. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality

Stochastic gradient descent with momentum (SGDM) has been widely used in...
research
06/07/2022

Integrating Random Effects in Deep Neural Networks

Modern approaches to supervised learning like deep neural networks (DNNs...
research
12/18/2017

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Stochastic Gradient Descent (SGD) with small mini-batch is a key compone...
research
05/20/2022

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differen...
research
07/01/2021

Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

Gaussian processes remain popular as a flexible and expressive model cla...
research
02/18/2018

Optimizing Spectral Sums using Randomized Chebyshev Expansions

The trace of matrix functions, often called spectral sums, e.g., rank, l...
research
06/19/2018

Faster SGD training by minibatch persistency

It is well known that, for most datasets, the use of large-size minibatc...

Please sign up or login with your details

Forgot password? Click here to reset