What is a Good Metric to Study Generalization of Minimax Learners?

by   Asuman Ozdaglar, et al.

Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms – gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.


Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Many machine learning problems can be formulated as minimax problems suc...

Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency

Local SGD is a promising approach to overcome the communication overhead...

Data-Driven Minimax Optimization with Expectation Constraints

Attention to data-driven optimization approaches, including the well-kno...

Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap

We show that convex-concave Lipschitz stochastic saddle point problems (...

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses ...

Uniform Convergence and Generalization for Nonconvex Stochastic Minimax Problems

This paper studies the uniform convergence and generalization bounds for...

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

Pairwise learning refers to learning tasks where the loss function depen...

Please sign up or login with your details

Forgot password? Click here to reset