Downstream Fairness Caveats with Synthetic Healthcare Data

03/09/2022
by   Karan Bhanot, et al.
0

This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data similar to real healthcare data without privacy risks. Healthcare datasets may have biases in which certain protected groups might experience worse outcomes than others. With the real data having biases, the fairness of synthetically generated health data comes into question. In this paper, we evaluate the fairness of models generated on two healthcare datasets for gender and race biases. We generate synthetic versions of the dataset using a Generative Adversarial Network called HealthGAN, and compare the real and synthetic model's balanced accuracy and fairness scores. We find that synthetic data has different fairness properties compared to real data and fairness mitigation techniques perform differently, highlighting that synthetic data is not bias free.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy

The widespread adoption of electronic health records and digital healthc...
research
05/11/2023

Fairness in Machine Learning meets with Equity in Healthcare

With the growing utilization of machine learning in healthcare, there is...
research
08/16/2023

Fair GANs through model rebalancing with synthetic data

Deep generative models require large amounts of training data. This ofte...
research
04/06/2023

Synthetic Data in Healthcare

Synthetic data are becoming a critical tool for building artificially in...
research
05/19/2023

Improving Fairness in AI Models on Electronic Health Records: The Case for Federated Learning Methods

Developing AI tools that preserve fairness is of critical importance, sp...
research
05/03/2019

In Defense of Synthetic Data

Synthetic datasets have long been thought of as second-rate, to be used ...
research
08/08/2023

Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs

Automatic segmentation of knee bony anatomy is essential in orthopedics,...

Please sign up or login with your details

Forgot password? Click here to reset