Revisiting ResNets: Improved Training and Scaling Strategies

by   Irwan Bello, et al.

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.


page 1

page 2

page 3

page 4


Revisiting 3D ResNets for Video Recognition

A recent work from Bello shows that training and scaling strategies may ...

Simple Training Strategies and Model Scaling for Object Detection

The speed-accuracy Pareto curve of object detection systems have advance...

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

PointNet++ is one of the most influential neural architectures for point...

ResNet strikes back: An improved training procedure in timm

The influential Residual Networks designed by He et al. remain the gold-...

Delving Deeper into Data Scaling in Masked Image Modeling

Understanding whether self-supervised learning methods can scale with un...

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

For the past 5 years, the ILSVRC competition and the ImageNet dataset ha...

Fast and Accurate Model Scaling

In this work we analyze strategies for convolutional neural network scal...

Please sign up or login with your details

Forgot password? Click here to reset