Towards Efficient Visual Adaption via Structural Re-parameterization

by   Gen Luo, et al.

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6 we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at


page 1

page 2

page 3

page 4


Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

With ever increasing parameters and computation, vision-language pre-tra...

Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

Parameter efficient transfer learning (PETL) is an emerging research spo...

Online Convolutional Re-parameterization

Structural re-parameterization has drawn increasing attention in various...

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Although the pre-trained Vision Transformers (ViTs) achieved great succe...

Learnable Parameter Similarity

Most of the existing approaches focus on specific visual tasks while ign...

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

It has become a popular paradigm to transfer the knowledge of large-scal...

MiniVLM: A Smaller and Faster Vision-Language Model

Recent vision-language (VL) studies have shown remarkable progress by le...

Please sign up or login with your details

Forgot password? Click here to reset