Revisiting the Train Loss: an Efficient Performance Estimator for Neural Architecture Search

by   Binxin Ru, et al.
University of Oxford
Imperial College London

Reliable yet efficient evaluation of generalisation performance of a proposed architecture is crucial to the success of neural architecture search (NAS). Traditional approaches face a variety of limitations: training each architecture to completion is prohibitively expensive, early stopping estimates may correlate poorly with fully trained performance, and model-based estimators require large training sets. Instead, motivated by recent results linking training speed and generalisation with stochastic gradient descent, we propose to estimate the final test performance based on the sum of training losses. Our estimator is inspired by the marginal likelihood, which is used for Bayesian model selection. Our model-free estimator is simple, efficient, and cheap to implement, and does not require hyperparameter-tuning or surrogate training before deployment. We demonstrate empirically that our estimator consistently outperforms other baselines and can achieve a rank correlation of 0.95 with final test accuracy on the NAS-Bench201 dataset within 50 epochs.


page 1

page 2

page 3

page 4


Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search

In this work, we show that simultaneously training and mixing neural net...

Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel

Bayesian optimisation (BO) has been widely used for hyperparameter optim...

Accelerating Neural Architecture Search using Performance Prediction

Methods for neural network hyperparameter optimization and meta-modeling...

DARTS without a Validation Set: Optimizing the Marginal Likelihood

The success of neural architecture search (NAS) has historically been li...

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

While existing work on neural architecture search (NAS) tunes hyperparam...

DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Differentiable architecture search (DARTS) is a widely researched tool f...

Bayesian Model Selection, the Marginal Likelihood, and Generalization

How do we compare between hypotheses that are entirely consistent with o...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset