FedSyn: Synthetic Data Generation using Federated Learning

by   Monik Raj Behera, et al.

As Deep Learning algorithms continue to evolve and become more sophisticated, they require massive datasets for model training and efficacy of models. Some of those data requirements can be met with the help of existing datasets within the organizations. Current Machine Learning practices can be leveraged to generate synthetic data from an existing dataset. Further, it is well established that diversity in generated synthetic data relies on (and is perhaps limited by) statistical properties of available dataset within a single organization or entity. The more diverse an existing dataset is, the more expressive and generic synthetic data can be. However, given the scarcity of underlying data, it is challenging to collate big data in one organization. The diverse, non-overlapping dataset across distinct organizations provides an opportunity for them to contribute their limited distinct data to a larger pool that can be leveraged to further synthesize. Unfortunately, this raises data privacy concerns that some institutions may not be comfortable with. This paper proposes a novel approach to generate synthetic data - FedSyn. FedSyn is a collaborative, privacy preserving approach to generate synthetic data among multiple participants in a federated and collaborative network. FedSyn creates a synthetic data generation model, which can generate synthetic data consisting of statistical distribution of almost all the participants in the network. FedSyn does not require access to the data of an individual participant, hence protecting the privacy of participant's data. The proposed technique in this paper leverages federated machine learning and generative adversarial network (GAN) as neural network architecture for synthetic data generation. The proposed method can be extended to many machine learning problem classes in finance, health, governance, technology and many more.


Synthetic Demographic Data Generation for Card Fraud Detection Using GANs

Using machine learning models to generate synthetic data has become comm...

Federated Learning with GAN-based Data Synthesis for Non-IID Clients

Federated learning (FL) has recently emerged as a popular privacy-preser...

Privacy-Preserving Synthetic Educational Data Generation

Institutions collect massive learning traces but they may not disclose i...

Cluster Aware Mobility Encounter Dataset Enlargement

The recent emerging fields in data processing and manipulation has facil...

Generating Synthetic Data in a Secure Federated General Adversarial Networks for a Consortium of Health Registries

In this work, we review the architecture design of existing federated Ge...

Conditional Synthetic Data Generation for Personal Thermal Comfort Models

Personal thermal comfort models aim to predict an individual's thermal c...

How Generative Models Improve LOS Estimation in 6G Non-Terrestrial Networks

With the advent of 5G and the anticipated arrival of 6G, there has been ...

Please sign up or login with your details

Forgot password? Click here to reset