Bias Mitigated Learning from Differentially Private Synthetic Data: A Cautionary Tale

08/24/2021
by   Sahra Ghalebikesabi, et al.
4

Increasing interest in privacy-preserving machine learning has led to new models for synthetic private data generation from undisclosed real data. However, mechanisms of privacy preservation introduce artifacts in the resulting synthetic data that have a significant impact on downstream tasks such as learning predictive models or inference. In particular, bias can affect all analyses as the synthetic data distribution is an inconsistent estimate of the real-data distribution. We propose several bias mitigation strategies using privatized likelihood ratios that have general applicability to differentially private synthetic data generative models. Through large-scale empirical evaluation, we show that bias mitigation provides simple and effective privacy-compliant augmentation for general applications of synthetic data. However, the work highlights that even after bias correction significant challenges remain on the usefulness of synthetic private data generators for tasks such as prediction and inference.

READ FULL TEXT

page 7

page 20

page 23

page 24

research
07/01/2023

When Synthetic Data Met Regulation

In this paper, we argue that synthetic data produced by Differentially P...
research
06/20/2023

Diverse Community Data for Benchmarking Data Privacy Algorithms

The Diverse Communities Data Excerpts are the core of a National Institu...
research
04/21/2023

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Data collected from the real world tends to be biased, unbalanced, and a...
research
11/07/2022

Private Set Generation with Discriminative Information

Differentially private data generation techniques have become a promisin...
research
08/08/2023

From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data

Visual recognition models are prone to learning spurious correlations in...
research
10/05/2022

Differentially Private Propensity Scores for Bias Correction

In surveys, it is typically up to the individuals to decide if they want...
research
07/12/2022

dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

We propose a general, flexible, and scalable framework dpart, an open so...

Please sign up or login with your details

Forgot password? Click here to reset