Random Function Descent

05/02/2023
by   Felix Benning, et al.
0

While gradient based methods are ubiquitous in machine learning, selecting the right step size often requires "hyperparameter tuning". This is because backtracking procedures like Armijo's rule depend on quality evaluations in every step, which are not available in a stochastic context. Since optimization schemes can be motivated using Taylor approximations, we replace the Taylor approximation with the conditional expectation (the best L^2 estimator) and propose "Random Function Descent" (RFD). Under light assumptions common in Bayesian optimization, we prove that RFD is identical to gradient descent, but with calculable step sizes, even in a stochastic context. We beat untuned Adam in synthetic benchmarks. To close the performance gap to tuned Adam, we propose a heuristic extension competitive with tuned Adam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2015

Speed learning on the fly

The practical performance of online stochastic gradient descent algorith...
research
04/03/2022

Understanding the unstable convergence of gradient descent

Most existing analyses of (stochastic) gradient descent rely on the cond...
research
08/12/2013

Faster gradient descent and the efficient recovery of images

Much recent attention has been devoted to gradient descent algorithms wh...
research
04/16/2018

Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Stochastic gradient methods enable learning probabilistic models from la...
research
10/02/2020

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

A fundamental challenge in deep learning is that the optimal step sizes ...
research
09/04/2023

Homomorphically encrypted gradient descent algorithms for quadratic programming

In this paper, we evaluate the different fully homomorphic encryption sc...
research
07/01/2020

Decentralised Learning with Random Features and Distributed Gradient Descent

We investigate the generalisation performance of Distributed Gradient De...

Please sign up or login with your details

Forgot password? Click here to reset