BinaryViT: Towards Efficient and Accurate Binary Vision Transformers

by   Junrui Xiao, et al.

Vision Transformers (ViTs) have emerged as the fundamental architecture for most computer vision fields, but the considerable memory and computation costs hinders their application on resource-limited devices. As one of the most powerful compression methods, binarization reduces the computation of the neural network by quantizing the weights and activation values as ±1. Although existing binarization methods have demonstrated excellent performance on Convolutional Neural Networks (CNNs), the full binarization of ViTs is still under-studied and suffering a significant performance drop. In this paper, we first argue empirically that the severe performance degradation is mainly caused by the weight oscillation in the binarization training and the information distortion in the activation of ViTs. Based on these analyses, we propose BinaryViT, an accurate full binarization scheme for ViTs, which pushes the quantization of ViTs to the limit. Specifically, we propose a novel gradient regularization scheme (GRS) for driving a bimodal distribution of the weights to reduce oscillation in binarization training. Moreover, we design an activation shift module (ASM) to adaptively tune the activation distribution to reduce the information distortion caused by binarization. Extensive experiments on ImageNet dataset show that our BinaryViT consistently surpasses the strong baseline by 2.05 binarized ViTs to a usable level. Furthermore, our method achieves impressive savings of 16.2× and 17.7× in model size and OPs compared to the full-precision DeiT-S. The codes and models will be released on github.


Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remar...

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Quantization is one of the most effective methods to compress neural net...

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

The complicated architecture and high training cost of vision transforme...

Mesa: A Memory-saving Training Framework for Transformers

There has been an explosion of interest in designing high-performance Tr...

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Vision transformers (ViTs) quantization offers a promising prospect to f...

Basic Binary Convolution Unit for Binarized Image Restoration Network

Lighter and faster image restoration (IR) models are crucial for the dep...

Boosting Binary Neural Networks via Dynamic Thresholds Learning

Developing lightweight Deep Convolutional Neural Networks (DCNNs) and Vi...

Please sign up or login with your details

Forgot password? Click here to reset