A Unified Approach to Controlling Implicit Regularization via Mirror Descent

by   Haoyuan Sun, et al.

Inspired by the remarkable success of deep neural networks, there has been significant interest in understanding the generalization performance of overparameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit ℓ_2-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and under suitable conditions, enjoys fast convergence. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances.


Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Driven by the empirical success and wide use of deep neural networks, un...

Linear Convergence and Implicit Regularization of Generalized Mirror Descent with Time-Dependent Mirrors

The following questions are fundamental to understanding the properties ...

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Despite their overwhelming capacity to overfit, deep neural networks tra...

The Implicit Bias of AdaGrad on Separable Data

We study the implicit bias of AdaGrad on separable linear classification...

Characterizing Implicit Bias in Terms of Optimization Geometry

We study the bias of generic optimization methods, including Mirror Desc...

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

Over-parameterization and adaptive methods have played a crucial role in...

Horseshoe Regularization for Machine Learning in Complex and Deep Models

Since the advent of the horseshoe priors for regularization, global-loca...

Please sign up or login with your details

Forgot password? Click here to reset