Channel-Directed Gradients for Optimization of Convolutional Neural Networks

08/25/2020
by   Dong Lao, et al.
18

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. The method works by computing the gradient of the loss function with respect to output-channel directed re-weighted L2 or Sobolev metrics, which has the effect of smoothing components of the gradient across a certain direction of the parameter tensor. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental. We present the continuum theory of such gradients, its discretization, and application to deep networks. Experiments on benchmark datasets, several networks and baseline optimizers show that optimizers can be improved in generalization error by simply computing the stochastic gradient with respect to output-channel directed metrics.

READ FULL TEXT
research
01/27/2017

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to m...
research
05/15/2020

Sobolev Gradients for the Möbius Energy

Aiming at optimizing the shape of closed embedded curves within prescrib...
research
08/03/2019

Ensemble Neural Networks (ENN): A gradient-free stochastic method

In this study, an efficient stochastic gradient-free method, the ensembl...
research
12/07/2021

A generalization gap estimation for overparameterized models via the Langevin functional variance

This paper discusses the estimation of the generalization gap, the diffe...
research
07/22/2019

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

Normalization layers are widely used in deep neural networks to stabiliz...
research
10/05/2022

Personalized Decentralized Bilevel Optimization over Stochastic and Directed Networks

While personalization in distributed learning has been extensively studi...
research
08/17/2023

Dual Gauss-Newton Directions for Deep Learning

Inspired by Gauss-Newton-like methods, we study the benefit of leveragin...

Please sign up or login with your details

Forgot password? Click here to reset