Self Expanding Neural Networks

07/10/2023
by   Rupert Mitchell, et al.
0

The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only the size of the network, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the "rate" at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.

READ FULL TEXT
research
07/18/2023

How Many Neurons Does it Take to Approximate the Maximum?

We study the size of a neural network needed to approximate the maximum ...
research
01/24/2019

Width Provably Matters in Optimization for Deep Linear Neural Networks

We prove that for an L-layer fully-connected linear neural network, if t...
research
07/11/2023

Using Linear Regression for Iteratively Training Neural Networks

We present a simple linear regression based approach for learning the we...
research
05/15/2023

Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method

Backpropagation (BP) is the most important gradient estimation method fo...
research
12/19/2019

Optimization for deep learning: theory and algorithms

When and why can a neural network be successfully trained? This article ...
research
07/25/2017

On The Robustness of a Neural Network

With the development of neural networks based machine learning and their...
research
01/21/2019

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

Consider the problem: given data pair (x, y) drawn from a population wit...

Please sign up or login with your details

Forgot password? Click here to reset