FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

by   Haoyu Zhao, et al.

Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is O(N^3/4/Sϵ), improving the best-known result O(N/Sϵ) of SCAFFOLD (Karimireddy et al.,2020) by a factor of N^1/4, where N is the total number of clients (usually is very large in federated learning), S is the sampled subset of clients in each communication round, and ϵ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is O(√(N)+S/Sϵ^2), improving the best-known result O(N^2/3/S^2/3ϵ^2) of SCAFFOLD (Karimireddy et al.,2020) by a factor of N^1/6S^1/3, if the sampled clients S≤√(N). Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new state-of-the-art results in terms of communication complexity for both federated convex and nonconvex optimization.


page 1

page 2

page 3

page 4


Gradient Masked Federated Optimization

Federated Averaging (FedAVG) has become the most popular federated learn...

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Due to the communication bottleneck in distributed and federated learnin...

Reducing the Communication Cost of Federated Learning through Multistage Optimization

A central question in federated learning (FL) is how to design optimizat...

SAGDA: Achieving 𝒪(ε^-2) Communication Complexity in Federated Min-Max Learning

To lower the communication complexity of federated min-max learning, a n...

A Communication-Efficient Adaptive Algorithm for Federated Learning under Cumulative Regret

We consider the problem of online stochastic optimization in a distribut...

Learning without Interaction Requires Separation

One of the key resources in large-scale learning systems is the number o...

Communication-Efficient Agnostic Federated Averaging

In distributed learning settings such as federated learning, the trainin...

Please sign up or login with your details

Forgot password? Click here to reset