Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

11/20/2019
by   Shiori Sagawa, et al.
0

Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, their poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization—stronger-than-typical ℓ_2 regularization or early stopping—we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce and give convergence guarantees for a stochastic optimizer for the group DRO setting, underpinning the empirical study above.

READ FULL TEXT
research
12/02/2022

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

Models trained via empirical risk minimization (ERM) are known to rely o...
research
05/09/2020

An Investigation of Why Overparameterization Exacerbates Spurious Correlations

We study why overparameterization – increasing model size well beyond th...
research
06/14/2021

Examining and Combating Spurious Features under Distribution Shift

A central goal of machine learning is to learn robust representations th...
research
05/20/2023

Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Models trained with empirical risk minimization (ERM) are revealed to ea...
research
10/21/2022

Just Mix Once: Worst-group Generalization by Group Interpolation

Advances in deep learning theory have revealed how average generalizatio...
research
06/20/2018

Fairness Without Demographics in Repeated Loss Minimization

Machine learning models (e.g., speech recognizers) are usually trained t...
research
03/10/2023

Distributionally Robust Optimization with Probabilistic Group

Modern machine learning models may be susceptible to learning spurious c...

Please sign up or login with your details

Forgot password? Click here to reset