Personalized Federated Learning with Exact Stochastic Gradient Descent

In Federated Learning (FL), datasets across clients tend to be heterogeneous or personalized, and this poses challenges to the convergence of standard FL schemes that do not account for personalization. To address this, we present a new approach for personalized FL that achieves exact stochastic gradient descent (SGD) minimization. We start from the FedPer (Arivazhagan et al., 2019) neural network (NN) architecture for personalization, whereby the NN has two types of layers: the first ones are the common layers across clients, while the few final ones are client-specific and are needed for personalization. We propose a novel SGD-type scheme where, at each optimization round, randomly selected clients perform gradient-descent updates over their client-specific weights towards optimizing the loss function on their own datasets, without updating the common weights. At the final update, each client computes the joint gradient over both client-specific and common weights and returns the gradient of common parameters to the server. This allows to perform an exact and unbiased SGD step over the full set of parameters in a distributed manner, i.e. the updates of the personalized parameters are performed by the clients and those of the common ones by the server. Our method is superior to FedAvg and FedPer baselines in multi-class classification benchmarks such as Omniglot, CIFAR-10, MNIST, Fashion-MNIST, and EMNIST and has much lower computational complexity per round.


page 1

page 2

page 3

page 4


Faster Federated Learning with Decaying Number of Local SGD Steps

In Federated Learning (FL) client devices connected over the internet co...

On Federated Learning with Energy Harvesting Clients

Catering to the proliferation of Internet of Things devices and distribu...

Mitigating Byzantine Attacks in Federated Learning

Prior solutions for mitigating Byzantine failures in federated learning,...

Heterogeneous Data-Aware Federated Learning

Federated learning (FL) is an appealing concept to perform distributed t...

SAGDA: Achieving 𝒪(ε^-2) Communication Complexity in Federated Min-Max Learning

To lower the communication complexity of federated min-max learning, a n...

Faster Asynchronous SGD

Asynchronous distributed stochastic gradient descent methods have troubl...

Beyond Backprop: Alternating Minimization with co-Activation Memory

We propose a novel online algorithm for training deep feedforward neural...

Please sign up or login with your details

Forgot password? Click here to reset