SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

by   Li Lyna Zhang, et al.

The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the quantization-unfriendly issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse quantization efficiency and can slow down the INT8 inference speed. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred operators and configurations to construct the search space, guided by a metric called Q-T score to quantify how quantization-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or quantization. Our discovered models establish new SOTA INT8 quantized accuracy under various latency constraints, achieving up to 10.1 improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.


page 1

page 2

page 3

page 4


Hardware-aware One-Shot Neural Architecture Search in Coordinate Ascent Framework

Designing accurate and efficient convolutional neural architectures for ...

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

As the applications of deep learning models on edge devices increase at ...

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Neural Architecture Search (NAS) has shown promising performance in the ...

OHQ: On-chip Hardware-aware Quantization

Quantization emerges as one of the most promising approaches for deployi...

Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual comp...

SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures

Designing neural architectures for edge devices is subject to constraint...

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constra...

Please sign up or login with your details

Forgot password? Click here to reset