Don't Stop Learning: Towards Continual Learning for the CLIP Model

by   Yuxuan Ding, et al.

The Contrastive Language-Image Pre-training (CLIP) Model is a recently proposed large-scale pre-train model which attracts increasing attention in the computer vision community. Benefiting from its gigantic image-text training set, the CLIP model has learned outstanding capabilities in zero-shot learning and image-text matching. To boost the recognition performance of CLIP on some target visual concepts, it is often desirable to further update the CLIP model by fine-tuning some classes-of-interest on extra training data. This operation, however, raises an important concern: will the update hurt the zero-shot learning or image-text matching capability of the CLIP, i.e., the catastrophic forgetting issue? If yes, could existing continual learning algorithms be adapted to alleviate the risk of catastrophic forgetting? To answer these questions, this work conducts a systemic study on the continual learning issue of the CLIP model. We construct evaluation protocols to measure the impact of fine-tuning updates and explore different ways to upgrade existing continual learning methods to mitigate the forgetting issue of the CLIP model. Our study reveals the particular challenges of CLIP continual learning problem and lays a foundation for further researches. Moreover, we propose a new algorithm, dubbed Learning without Forgetting via Replayed Vocabulary (VR-LwF), which shows exact effectiveness for alleviating the forgetting issue of the CLIP model.


page 1

page 4


Dynamic VAEs with Generative Replay for Continual Zero-shot Learning

Continual zero-shot learning(CZSL) is a new domain to classify objects s...

Bookworm continual learning: beyond zero-shot learning and continual learning

We propose bookworm continual learning(BCL), a flexible setting where un...

Practical self-supervised continual learning with continual fine-tuning

Self-supervised learning (SSL) has shown remarkable performance in compu...

Continual Speaker Adaptation for Text-to-Speech Synthesis

Training a multi-speaker Text-to-Speech (TTS) model from scratch is comp...

POP: Prompt Of Prompts for Continual Learning

Continual learning (CL) has attracted increasing attention in the recent...

Generative Negative Text Replay for Continual Vision-Language Pretraining

Vision-language pre-training (VLP) has attracted increasing attention re...

Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering

Continual learning aims to train a model incrementally on a sequence of ...

Please sign up or login with your details

Forgot password? Click here to reset