The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

by   Bohan Wang, et al.

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize relatively well to unseen data. Recently, researchers explained it by investigating the implicit bias of optimization algorithms. A remarkable progress is the work [18], which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except the first-order optimization algorithms like GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to its rapid training process. Meanwhile, numerous works have provided empirical evidence that adaptive methods may suffer from poor generalization performance. However, theoretical explanation for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit bias of adaptive optimization algorithms on homogeneous neural networks. In particular, we study the convergent direction of parameters when they are optimizing the logistic loss. We prove that the convergent direction of RMSProp is the same with GD, while for AdaGrad, the convergent direction depends on the adaptive conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel and nontrivial adaptive gradient flow and surrogate margin. The theoretical findings explain the superiority on generalization of exponential moving average strategy that is adopted by RMSProp and Adam. To the best of knowledge, it is the first work to study the convergent direction of adaptive optimizations on non-linear deep neural networks


page 1

page 2

page 3

page 4


AdaFamily: A family of Adam-like adaptive gradient methods

We propose AdaFamily, a novel method for training deep neural networks. ...

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Driven by the empirical success and wide use of deep neural networks, un...

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of deep neural networks, there has be...

A Comparison of Optimization Algorithms for Deep Learning

In recent years, we have witnessed the rise of deep learning. Deep neura...

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

In this work, we explore the maximum-margin bias of quasi-homogeneous ne...

Optimizing Neural Network Weights using Nature-Inspired Algorithms

This study aims to optimize Deep Feedforward Neural Networks (DFNNs) tra...

In Search of Probeable Generalization Measures

Understanding the generalization behaviour of deep neural networks is a ...

Please sign up or login with your details

Forgot password? Click here to reset