A Dynamic Sampling Adaptive-SGD Method for Machine Learning

12/31/2019
by   Achraf Bahamou, et al.
27

We propose a stochastic optimization method for minimizing loss functions, which can be expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions, eliminating the need for the user to tune the learning rate. The proposed method exploits local curvature information and ensures that search directions are descent directions with high probability using an acute-angle test. The method is proved to have, under reasonable assumptions, a global linear rate of convergence on self-concordant functions with high probability. Numerical experiments show that this method is able to choose the best learning rates and compares favorably to fine-tuned SGD for training logistic regression and Deep Neural Networks (DNNs). We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM for training DNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning

We propose a new per-layer adaptive step-size procedure for stochastic f...
research
10/30/2017

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptive...
research
02/25/2020

Statistical Adaptive Stochastic Gradient Methods

We propose a statistical adaptive procedure called SALSA for automatical...
research
04/02/2022

AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

It is well known that we need to choose the hyper-parameters in Momentum...
research
03/02/2022

Adaptive Gradient Methods with Local Guarantees

Adaptive gradient methods are the method of choice for optimization in m...
research
03/05/2021

Second-order step-size tuning of SGD for non-convex optimization

In view of a direct and simple improvement of vanilla SGD, this paper pr...
research
05/27/2021

Training With Data Dependent Dynamic Learning Rates

Recently many first and second order variants of SGD have been proposed ...

Please sign up or login with your details

Forgot password? Click here to reset