Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

by   Zhaoxian Wu, et al.

This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD.


Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data

We propose a Byzantine-robust variance-reduced stochastic gradient desce...

The Hidden Vulnerability of Distributed Learning in Byzantium

While machine learning is going through an era of celebrated success, co...

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

In this work, we consider the resilience of distributed algorithms based...

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Distributed optimization with open collaboration is a popular field sinc...

Learning from History for Byzantine Robust Optimization

Byzantine robustness has received significant attention recently given i...

A simplified convergence theory for Byzantine resilient stochastic gradient descent

In distributed learning, a central server trains a model according to up...

Distributed Momentum for Byzantine-resilient Learning

Momentum is a variant of gradient descent that has been proposed for its...

Please sign up or login with your details

Forgot password? Click here to reset