Leaving Reality to Imagination: Robust Classification via Generated Datasets

02/05/2023
by   Hritik Bansal, et al.
0

Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets that are similar to the test set, and those that are from a naturally shifted distribution, such as sketches, paintings, and animations of the object categories observed during training. Prior work focuses on reducing this gap by designing engineered augmentations of training data or through unsupervised pretraining of a single large model on massive in-the-wild training datasets scraped from the Internet. However, the notion of a dataset is also undergoing a paradigm shift in recent years. With drastic improvements in the quality, ease-of-use, and access to modern generative models, generated data is pervading the web. In this light, we study the question: How do these generated datasets influence the natural robustness of image classifiers? We find that Imagenet classifiers trained on real data augmented with generated data achieve higher accuracy and effective robustness than standard training and popular augmentation strategies in the presence of natural distribution shifts. We analyze various factors influencing these results, including the choice of conditioning strategies and the amount of generated data. Lastly, we introduce and analyze an evolving generated dataset, ImageNet-G-v1, to better benchmark the design, utility, and critique of standalone generated datasets for robust and trustworthy machine learning. The code and datasets are available at https://github.com/Hritikbansal/generative-robustness.

READ FULL TEXT

page 5

page 8

page 18

page 19

page 20

research
09/03/2023

Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

Robustness in Natural Language Processing continues to be a pertinent is...
research
03/30/2023

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing

Recent studies have shown that higher accuracy on ImageNet usually leads...
research
11/07/2019

This dataset does not exist: training models from generated images

Current generative networks are increasingly proficient in generating hi...
research
08/07/2023

Distributionally Robust Classification on a Data Budget

Real world uses of deep learning require predictable model behavior unde...
research
08/11/2020

BREEDS: Benchmarks for Subpopulation Shift

We develop a methodology for assessing the robustness of models to subpo...
research
11/01/2021

A Unified View of cGANs with and without Classifiers

Conditional Generative Adversarial Networks (cGANs) are implicit generat...
research
08/03/2022

RealPatch: A Statistical Matching Framework for Model Patching with Real Samples

Machine learning classifiers are typically trained to minimise the avera...

Please sign up or login with your details

Forgot password? Click here to reset