Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis

by   Yuanxing Chen, et al.

In diverse fields ranging from finance to omics, it is increasingly common that data is distributed and with multiple individual sources (referred to as “clients” in some studies). Integrating raw data, although powerful, is often not feasible, for example, when there are considerations on privacy protection. Distributed learning techniques have been developed to integrate summary statistics as opposed to raw data. In many of the existing distributed learning studies, it is stringently assumed that all the clients have the same model. To accommodate data heterogeneity, some federated learning methods allow for client-specific models. In this article, we consider the scenario that clients form clusters, those in the same cluster have the same model, and different clusters have different models. Further considering the clustering structure can lead to a better understanding of the “interconnections” among clients and reduce the number of parameters. To this end, we develop a novel penalization approach. Specifically, group penalization is imposed for regularized estimation and selection of important variables, and fusion penalization is imposed to automatically cluster clients. An effective ADMM algorithm is developed, and the estimation, selection, and clustering consistency properties are established under mild conditions. Simulation and data analysis further demonstrate the practical utility and superiority of the proposed approach.


HetVis: A Visual Analysis Approach for Identifying Data Heterogeneity in Horizontal Federated Learning

Horizontal federated learning (HFL) enables distributed clients to train...

Federated cINN Clustering for Accurate Clustered Federated Learning

Federated Learning (FL) presents an innovative approach to privacy-prese...

FedSS: Federated Learning with Smart Selection of clients

Federated learning provides the ability to learn over heterogeneous user...

Client Selection Approach in Support of Clustered Federated Learning over Wireless Edge Networks

Clustered Federated Multitask Learning (CFL) was introduced as an effici...

CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization

Federated learning enables edge devices to train a global model collabor...

Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

Federated learning is generally used in tasks where labels are readily a...

Please sign up or login with your details

Forgot password? Click here to reset