Improving Generalization via Uncertainty Driven Perturbations

by   Matteo Pagliardini, et al.

Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bias - the tendency of gradient-based algorithms to learn simple models - which include the model's high sensitivity to small input perturbations, as well as sub-optimal margins. In particular, while Stochastic Gradient Descent yields max-margin boundary on linear models, such guarantee does not extend to non-linear models. To mitigate the simplicity bias, we consider uncertainty-driven perturbations (UDP) of the training data points, obtained iteratively by following the direction that maximizes the model's estimated uncertainty. Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary, allowing for using a larger range of values for the hyperparameter that controls the magnitude of the perturbation. Moreover, as real-world datasets have non-isotropic distances between data points of different classes, the above property is particularly appealing for increasing the margin of the decision boundary, which in turn improves the model's generalization. We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets. Interestingly, it also achieves competitive loss-based robustness and generalization trade-off on several datasets.


page 5

page 16

page 17

page 18

page 19

page 22


Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...

Adversarial Attack for Uncertainty Estimation: Identifying Critical Regions in Neural Networks

We propose a novel method to capture data points near decision boundary ...

The Pitfalls of Simplicity Bias in Neural Networks

Several works have proposed Simplicity Bias (SB)—the tendency of standar...

Towards Understanding the Data Dependency of Mixup-style Training

In the Mixup training paradigm, a model is trained using convex combinat...

Stationary Point Losses for Robust Model

The inability to guarantee robustness is one of the major obstacles to t...

Linear Range in Gradient Descent

This paper defines linear range as the range of parameter perturbations ...

Please sign up or login with your details

Forgot password? Click here to reset