Scalable Transfer Learning with Expert Models

by   Joan Puigcerver, et al.

Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.


page 2

page 15

page 16

page 17

page 19

page 20

page 27


Scalable Neural Data Server: A Data Recommender for Transfer Learning

Absence of large-scale labeled data in the practitioner's target domain ...

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

Parameter-Efficient Transfer Learning (PETL) aims at efficiently adaptin...

Large Scale Learning of General Visual Representations for Transfer

Transfer of pre-trained representations improves sample efficiency and s...

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Pre-trained machine learning (ML) models have shown great performance fo...

Towards Compute-Optimal Transfer Learning

The field of transfer learning is undergoing a significant shift with th...

Which Model to Transfer? Finding the Needle in the Growing Haystack

Transfer learning has been recently popularized as a data-efficient alte...

Diversified Dynamic Routing for Vision Tasks

Deep learning models for vision tasks are trained on large datasets unde...

Please sign up or login with your details

Forgot password? Click here to reset