Identification Risk Evaluation of Continuous Synthesized Variables

06/01/2020
by   Ryan Hornby, et al.
0

We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius r in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius r, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset