From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data

08/08/2023
by   Maan Qraitem, et al.
0

Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (Females) are under-represented in certain classes (Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Prior work has shown that Visual Recognition datasets frequently under-r...
research
08/24/2021

Bias Mitigated Learning from Differentially Private Synthetic Data: A Cautionary Tale

Increasing interest in privacy-preserving machine learning has led to ne...
research
10/24/2020

Efficiently Mitigating Classification Bias via Transfer Learning

Prediction bias in machine learning models refers to unintended model be...
research
10/14/2022

Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding

Dataset bias has attracted increasing attention recently for its detrime...
research
10/21/2022

BlanketGen – A synthetic blanket occlusion augmentation pipeline for MoCap datasets

Human motion analysis has seen drastic improvements recently, however, d...
research
03/03/2022

CAFE: Learning to Condense Dataset by Aligning Features

Dataset condensation aims at reducing the network training effort throug...
research
05/25/2023

Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Electronic health records (EHR) often contain different rates of represe...

Please sign up or login with your details

Forgot password? Click here to reset