Tuning Pre-trained Model via Moment Probing

by   Mingze Gao, et al.

Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC^3) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP_+. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP_+ achieves state-of-the-art performance.


page 1

page 2

page 3

page 4


𝒴-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

With the success of large-scale pre-trained models (PTMs), how efficient...

Syntax-Enhanced Pre-trained Model

We study the problem of leveraging the syntactic structure of text to en...

Reprogramming under constraints: Revisiting efficient and reliable transferability of lottery tickets

In the era of foundation models with huge pre-training budgets, the down...

Read-only Prompt Optimization for Vision-Language Few-shot Learning

In recent years, prompt tuning has proven effective in adapting pre-trai...

A Stability Analysis of Fine-Tuning a Pre-Trained Model

Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT,...

A multi-resolution approximation via linear projection for large spatial datasets

Recent technical advances in collecting spatial data have been increasin...

Please sign up or login with your details

Forgot password? Click here to reset