Ordering for Non-Replacement SGD

06/28/2023
by   Yuetong Xu, et al.
0

One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on synthesis and real data sets. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks, which show promising results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Closing the convergence gap of SGD without replacement

Stochastic gradient descent without replacement sampling is widely used ...
research
03/04/2019

SGD without Replacement: Sharper Rates for General Smooth Convex Functions

We study stochastic gradient descent without replacement () for smooth ...
research
09/27/2016

Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical R...
research
04/18/2020

On Tight Convergence Rates of Without-replacement SGD

For solving finite-sum optimization problems, SGD without replacement sa...
research
10/09/2018

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

We study Stochastic Gradient Descent (SGD) with diminishing step sizes f...
research
10/11/2019

General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme

The incremental aggregated gradient algorithm is popular in network opti...
research
06/05/2019

Data Sketching for Faster Training of Machine Learning Models

Many machine learning problems reduce to the problem of minimizing an ex...

Please sign up or login with your details

Forgot password? Click here to reset