Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

10/20/2021
by   Kaichao You, et al.
4

Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact under-exploited: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This nav̈e but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work <cit.> proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks before fine-tuning. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

This paper studies task adaptive pre-trained model selection, an underex...
research
10/17/2022

ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Recent advances on large-scale pre-training have shown great potentials ...
research
08/17/2023

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

The rapid expansion of foundation pre-trained models and their fine-tune...
research
06/06/2023

Model Spider: Learning to Rank Pre-Trained Models Efficiently

Figuring out which Pre-Trained Model (PTM) from a model zoo fits the tar...
research
09/14/2021

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

Pre-trained model such as BERT has been proved to be an effective tool f...
research
02/18/2023

Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

An effective ranking model usually requires a large amount of training d...
research
01/24/2023

A Stability Analysis of Fine-Tuning a Pre-Trained Model

Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT,...

Please sign up or login with your details

Forgot password? Click here to reset