PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

by   Zhikai Li, et al.

Data-free quantization can potentially address data privacy and security concerns in model compression, and thus has been widely investigated. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code will be released and merged at:


page 1

page 4

page 8

page 9


Patch Similarity Aware Data-Free Quantization for Vision Transformers

Vision transformers have recently gained great success on various comput...

Quantized Feature Distillation for Network Quantization

Neural network quantization aims to accelerate and trim full-precision n...

FQ-ViT: Fully Quantized Vision Transformer without Retraining

Network quantization significantly reduces model inference complexity an...

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remar...

Training data-efficient image transformers distillation through attention

Recently, neural networks purely based on attention were shown to addres...

Quantization Mimic: Towards Very Tiny CNN for Object Detection

In this paper, we propose a simple and general framework for training ve...

Learning in School: Multi-teacher Knowledge Inversion for Data-Free Quantization

User data confidentiality protection is becoming a rising challenge in t...

Please sign up or login with your details

Forgot password? Click here to reset