Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets

02/20/2020
by   Ilqar ramazanli, et al.
0

We study distributed optimization algorithms for minimizing the average of heterogeneous functions distributed across several machines with a focus on communication efficiency. In such settings, naively using the classical stochastic gradient descent (SGD) or its variants (e.g., SVRG) with a uniform sampling of machines typically yields poor performance. It often leads to the dependence of convergence rate on maximum Lipschitz constant of gradients across the devices. In this paper, we propose a novel adaptive sampling of machines specially catered to these settings. Our method relies on an adaptive estimate of local Lipschitz constants base on the information of past gradients. We show that the new way improves the dependence of convergence rate from maximum Lipschitz constant to average Lipschitz constant across machines, thereby, significantly accelerating the convergence. Our experiments demonstrate that our method indeed speeds up the convergence of the standard SVRG algorithm in heterogeneous environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2014

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Stochastic Gradient Descent (SGD) is a popular optimization method which...
research
06/16/2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for dist...
research
12/31/2020

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary wor...
research
06/05/2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Adaptive gradient methods such as AdaGrad and its variants update the st...
research
03/10/2020

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimizat...
research
06/27/2022

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Convergence and convergence rate analyses of adaptive methods, such as A...
research
08/30/2016

Data Dependent Convergence for Distributed Stochastic Optimization

In this dissertation we propose alternative analysis of distributed stoc...

Please sign up or login with your details

Forgot password? Click here to reset