High-Performance Large-Scale Image Recognition Without Normalization

02/11/2021
by   Andrew Brock, et al.
0

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5 models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2 deepmind-research/tree/master/nfnets

READ FULL TEXT
research
03/03/2019

Accelerating Training of Deep Neural Networks with a Standardization Loss

A significant advance in accelerating neural network training has been t...
research
02/11/2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Training Deep Neural Networks is complicated by the fact that the distri...
research
11/16/2018

Image Classification at Supercomputer Scale

Deep learning is extremely computationally intensive, and hardware vendo...
research
06/07/2021

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Much recent research has been dedicated to improving the efficiency of t...
research
05/20/2022

Kernel Normalized Convolutional Networks

Existing deep convolutional neural network (CNN) architectures frequentl...
research
03/27/2023

Sigmoid Loss for Language Image Pre-Training

We propose a simple pairwise sigmoid loss for image-text pre-training. U...
research
09/24/2017

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Batch normalization (BN) has become a de facto standard for training dee...

Please sign up or login with your details

Forgot password? Click here to reset