Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

06/06/2023
by   Alexander Immer, et al.
8

Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.

READ FULL TEXT
research
02/11/2015

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients ...
research
04/11/2021

Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

Marginal-likelihood based model-selection, even though promising, is rar...
research
06/26/2023

General adjoint-differentiated Laplace approximation

The hierarchical prior used in Latent Gaussian models (LGMs) induces a p...
research
04/28/2023

Hyperparameter Optimization through Neural Network Partitioning

Well-tuned hyperparameters are crucial for obtaining good generalization...
research
07/12/2023

Online Laplace Model Selection Revisited

The Laplace approximation provides a closed-form model selection objecti...
research
06/17/2022

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

The linearised Laplace method for estimating model uncertainty has recei...
research
03/06/2017

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing th...

Please sign up or login with your details

Forgot password? Click here to reset