Characterizing signal propagation to close the performance gap in unnormalized ResNets

01/21/2021
by   Andrew Brock, et al.
0

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2017

Shifting Mean Activation Towards Zero with Bipolar Activation Functions

We propose a simple extension to the ReLU-family of activation functions...
research
10/04/2019

Farkas layers: don't shift the data, fix the geometry

Successfully training deep neural networks often requires either batch n...
research
03/31/2021

Fast Certified Robust Training via Better Initialization and Shorter Warmup

Recently, bound propagation based certified adversarial defense have bee...
research
05/28/2023

On the impact of activation and normalization in obtaining isometric embeddings at initialization

In this paper, we explore the structure of the penultimate Gram matrix i...
research
05/29/2019

An Inertial Newton Algorithm for Deep Learning

We devise a learning algorithm for possibly nonsmooth deep neural networ...
research
03/15/2022

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging tas...
research
02/11/2020

Goldilocks Neural Networks

We introduce the new "Goldilocks" class of activation functions, which n...

Please sign up or login with your details

Forgot password? Click here to reset