D^2: Decentralized Training over Decentralized Data

03/19/2018
by   Hanlin Tang, et al.
0

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be unique and different. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are not too different. In this paper, we ask the question: Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers? In this paper, we present D^2, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance among workers (imprecisely, "decentralized" data). The core of D^2 is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from O(σ√(nT) + (nζ^2)^1/3 T^2/3) to O(σ√(nT)) where ζ^2 denotes the variance among data on different workers. As a result, D^2 is robust to data variance among workers. We empirically evaluated D^2 on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D^2 significantly outperforms D-PSGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2020

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

We give a sharp convergence rate for the asynchronous stochastic gradien...
research
03/24/2023

'I am both here and there' Parallel Control of Multiple Robotic Avatars by Disabled Workers in a Café

Robotic avatars can help disabled people extend their reach in interacti...
research
05/14/2019

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

We consider distributed gradient descent in the presence of stragglers. ...
research
04/07/2020

Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

This paper investigates the stochastic optimization problem with a focus...
research
02/02/2022

Asynchronous Decentralized Learning over Unreliable Wireless Networks

Decentralized learning enables edge users to collaboratively train model...
research
02/16/2022

Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits

We consider the distributed stochastic gradient descent problem, where a...
research
03/09/2021

Proof-of-Learning: Definitions and Practice

Training machine learning (ML) models typically involves expensive itera...

Please sign up or login with your details

Forgot password? Click here to reset