Frustratingly Easy Model Generalization by Dummy Risk Minimization

08/04/2023
by   Juncheng Wang, et al.
0

Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.

READ FULL TEXT
research
05/24/2023

Taylor Learning

Empirical risk minimization stands behind most optimization in supervise...
research
10/13/2022

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-o...
research
10/24/2019

Diametrical Risk Minimization: Theory and Computations

The theoretical and empirical performance of Empirical Risk Minimization...
research
06/13/2022

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that reli...
research
03/15/2022

Surrogate Gap Minimization Improves Sharpness-Aware Training

The recently proposed Sharpness-Aware Minimization (SAM) improves genera...
research
02/18/2018

Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability

We study the detailed path-wise behavior of the discrete-time Langevin a...
research
04/28/2023

An Adaptive Policy to Employ Sharpness-Aware Minimization

Sharpness-aware minimization (SAM), which searches for flat minima by mi...

Please sign up or login with your details

Forgot password? Click here to reset