Trading-off variance and complexity in stochastic gradient descent

by   Vatsal Shah, et al.
The University of Texas at Austin

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing a new update rule which requires infrequent passes over the entire input dataset to compute the full-gradient. In this work, we propose CheapSVRG, a stochastic variance-reduction optimization scheme. Our algorithm is similar to SVRG but instead of the full gradient, it uses a surrogate which can be efficiently computed on a small subset of the input data. It achieves a linear convergence rate ---up to some error level, depending on the nature of the optimization problem---and features a trade-off between the computational complexity and the convergence rate. Empirical evaluation shows that CheapSVRG performs at least competitively compared to the state of the art.


page 1

page 2

page 3

page 4


A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale...

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimizat...

Computational Complexity of Sub-linear Convergent Algorithms

Optimizing machine learning algorithms that are used to solve the object...

Variance Suppression: Balanced Training Process in Deep Learning

Stochastic gradient descent updates parameters with summation gradient c...

Debiasing Stochastic Gradient Descent to handle missing values

A major caveat of large scale data is their incom-pleteness. We propose ...

Stochastic Variance Reduced Gradient for affine rank minimization problem

We develop an efficient stochastic variance reduced gradient descent alg...

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in l...

Please sign up or login with your details

Forgot password? Click here to reset