AI Chat AI Image Generator AI Video Text to Speech

Random Function Descent

05/02/2023

∙

by Felix Benning, et al.

∙

∙

While gradient based methods are ubiquitous in machine learning, selecting the right step size often requires "hyperparameter tuning". This is because backtracking procedures like Armijo's rule depend on quality evaluations in every step, which are not available in a stochastic context. Since optimization schemes can be motivated using Taylor approximations, we replace the Taylor approximation with the conditional expectation (the best L^2 estimator) and propose "Random Function Descent" (RFD). Under light assumptions common in Bayesian optimization, we prove that RFD is identical to gradient descent, but with calculable step sizes, even in a stochastic context. We beat untuned Adam in synthetic benchmarks. To close the performance gap to tuned Adam, we propose a heuristic extension competitive with tuned Adam.

Felix Benning
2 publications
Leif Döring
1 publication

page 1

page 2

page 3

page 4

research

∙ 11/08/2015

Speed learning on the fly

The practical performance of online stochastic gradient descent algorith...

0 Pierre-Yves Massé, et al. ∙

research

∙ 04/03/2022

Understanding the unstable convergence of gradient descent

Most existing analyses of (stochastic) gradient descent rely on the cond...

0 Kwangjun Ahn, et al. ∙

research

∙ 08/12/2013

Faster gradient descent and the efficient recovery of images

Much recent attention has been devoted to gradient descent algorithms wh...

0 Hui Huang, et al. ∙

research

∙ 04/16/2018

Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Stochastic gradient methods enable learning probabilistic models from la...

0 Dmitry Babichev, et al. ∙

research

∙ 10/02/2020

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

A fundamental challenge in deep learning is that the optimal step sizes ...

0 Maximus Mutschler, et al. ∙

research

∙ 09/04/2023

Homomorphically encrypted gradient descent algorithms for quadratic programming

In this paper, we evaluate the different fully homomorphic encryption sc...

0 André Bertolace, et al. ∙

research

∙ 07/01/2020

Decentralised Learning with Random Features and Distributed Gradient Descent

We investigate the generalisation performance of Distributed Gradient De...

11 Dominic Richards, et al. ∙