AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Neural architecture search (NAS) has shown great promise designing state-of-the-art (SOTA) models that are both accurate and fast. Recently, two-stage NAS, e.g. BigNAS, decouples the model training and searching process and achieves good search efficiency. Two-stage NAS requires sampling from the search space during training, which directly impacts the accuracy of the final searched models. While uniform sampling has been widely used for simplicity, it is agnostic of the model performance Pareto front, which are the main focus in the search process, and thus, misses opportunities to further improve the model accuracy. In this work, we propose AttentiveNAS that focuses on sampling the networks to improve the performance Pareto. We also propose algorithms to efficiently and effectively identify the networks on the Pareto during training. Without extra re-training or post-processing, we can simultaneously obtain a large number of networks across a wide range of FLOPs. Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77.3 on ImageNet, and outperforms SOTA models, including BigNAS, Once-for-All networks and FBNetV3. We also achieve ImageNet accuracy of 80.1 MFLOPs.
READ FULL TEXT