Fully Synthetic Data for Complex Surveys

09/17/2023
by   Shirley Mathur, et al.
0

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as the public use files. We use the framework of multiple imputation to enable variance estimation using two data generation strategies. In the first, we generate multiple data sets from each simple random sample, whereas in the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate each approach and the repeated sampling properties of the combining rules using simulation studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2020

Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat

In many contexts, missing data and disclosure control are ubiquitous and...
research
12/27/2018

Combining Non-probability and Probability Survey Samples Through Mass Imputation

This paper presents theoretical results on combining non-probability and...
research
11/16/2020

Foundations of Bayesian Learning from Synthetic Data

There is significant growth and interest in the use of synthetic data as...
research
01/15/2021

Private Tabular Survey Data Products through Synthetic Microdata Generation

We propose three synthetic microdata approaches to generate private tabu...
research
06/27/2023

Assessing small area estimates via artificial populations from KBAABB: a kNN-based approximation to ABB

Comparing and evaluating small area estimation (SAE) models for a given ...
research
07/08/2020

Approximate Bayesian Computations to fit and compare insurance loss models

Approximate Bayesian Computation (ABC) is a statistical learning techniq...
research
12/18/2013

Perturbed Gibbs Samplers for Synthetic Data Release

We propose a categorical data synthesizer with a quantifiable disclosure...

Please sign up or login with your details

Forgot password? Click here to reset