A Stochastic Bundle Method for Interpolating Networks

01/29/2022
by   Alasdair Paren, et al.
0

We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.

READ FULL TEXT

page 25

page 26

page 27

research
03/23/2022

An Adaptive Gradient Method with Energy and Momentum

We introduce a novel algorithm for gradient-based optimization of stocha...
research
08/07/2018

Robust Implicit Backpropagation

Arguably the biggest challenge in applying neural networks is tuning the...
research
06/13/2019

Training Neural Networks for and by Interpolation

The majority of modern deep learning models are able to interpolate the ...
research
09/28/2021

slimTrain – A Stochastic Approximation Method for Training Separable Deep Neural Networks

Deep neural networks (DNNs) have shown their success as high-dimensional...
research
01/25/2022

Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

Several studies have shown the ability of natural gradient descent to mi...
research
08/21/2023

We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

In the rapidly advancing field of deep learning, optimising deep neural ...
research
02/28/2019

Novel and Efficient Approximations for Zero-One Loss of Linear Classifiers

The predictive quality of machine learning models is typically measured ...

Please sign up or login with your details

Forgot password? Click here to reset