Big Batch SGD: Automated Inference using Adaptive Batch Sizes

10/18/2016
by   Soham De, et al.
0

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative "big batch" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2019

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) approximates the gradient o...
research
10/21/2019

Faster Stochastic Algorithms via History-Gradient Aided Batch Size Adaptation

Various schemes for adapting batch size have been recently proposed to a...
research
05/23/2018

Predictive Local Smoothness for Stochastic Gradient Methods

Stochastic gradient methods are dominant in nonconvex optimization espec...
research
03/26/2022

A Robust Optimization Method for Label Noisy Datasets Based on Adaptive Threshold: Adaptive-k

SGD does not produce robust results on datasets with label noise. Becaus...
research
07/09/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

When using large-batch training to speed up stochastic gradient descent,...
research
04/09/2019

On the Adaptivity of Stochastic Gradient-Based Optimization

Stochastic-gradient-based optimization has been a core enabling methodol...

Please sign up or login with your details

Forgot password? Click here to reset