Improving the Utility of Poisson-Distributed, Differentially Private Synthetic Data via Prior Predictive Truncation with an Application to CDC WONDER

by   Harrison Quick, et al.

CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the U.S. Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework's ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.


Generating Poisson-Distributed Differentially Private Synthetic Data

The dissemination of synthetic data can be an effective means of making ...

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Deep learning models have demonstrated superior performance in several a...

Differentially Private Algorithms for 2020 Census Detailed DHC Race & Ethnicity

This article describes a proposed differentially private (DP) algorithms...

Utility-Optimized Synthesis of Differentially Private Location Traces

Differentially private location trace synthesis (DPLTS) has recently eme...

Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

Differential privacy mechanisms are increasingly used to enable public r...

Please sign up or login with your details

Forgot password? Click here to reset