GTV: Generating Tabular Data via Vertical Federated Learning

by   Zilong Zhao, et al.

Generative Adversarial Networks (GANs) have achieved state-of-the-art results in tabular data synthesis, under the presumption of direct accessible training data. Vertical Federated Learning (VFL) is a paradigm which allows to distributedly train machine learning model with clients possessing unique features pertaining to the same individuals, where the tabular data learning is the primary use case. However, it is unknown if tabular GANs can be learned in VFL. Demand for secure data transfer among clients and GAN during training and data synthesizing poses extra challenge. Conditional vector for tabular GANs is a valuable tool to control specific features of generated data. But it contains sensitive information from real data - risking privacy guarantees. In this paper, we propose GTV, a VFL framework for tabular GANs, whose key components are generator, discriminator and the conditional vector. GTV proposes an unique distributed training architecture for generator and discriminator to access training data in a privacy-preserving manner. To accommodate conditional vector into training without privacy leakage, GTV designs a mechanism training-with-shuffling to ensure that no party can reconstruct training data with conditional vector. We evaluate the effectiveness of GTV in terms of synthetic data quality, and overall training scalability. Results show that GTV can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by centralized GAN algorithm. The difference on machine learning utility can be as low as to 2.7 imbalanced data distributions across clients and different number of clients.


page 3

page 5

page 6

page 10

page 11


Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning

Federated learning, i.e., a mobile edge computing framework for deep lea...

OCT-GAN: Neural ODE-based Conditional Tabular GANs

Synthesizing tabular data is attracting much attention these days for va...

Class-Conditional VAE-GAN for Local-Ancestry Simulation

Local ancestry inference (LAI) allows identification of the ancestry of ...

Generating Optimal Privacy-Protection Mechanisms via Machine Learning

We consider the problem of obfuscating sensitive information while prese...

Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-supervised Learning Approach

With the rising demand for wireless services and increased awareness of ...

Private data sharing between decentralized users through the privGAN architecture

More data is almost always beneficial for analysis and machine learning ...

Generating Synthetic Data in a Secure Federated General Adversarial Networks for a Consortium of Health Registries

In this work, we review the architecture design of existing federated Ge...

Please sign up or login with your details

Forgot password? Click here to reset