Preserving In-Context Learning ability in Large Language Model Fine-tuning

by   Yihan Wang, et al.

Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-shot learning without changing model parameters. However, as we show, fine-tuning an LLM on any specific task generally destroys its in-context ability. We discover an important cause of this loss, format specialization, where the model overfits to the format of the fine-tuned task and is unable to output anything beyond this format. We further show that format specialization happens at the beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that preserves in-context abilities of the pretrained model. ProMoT first trains a soft prompt for the fine-tuning target task, and then fine-tunes the model itself with this soft prompt attached. ProMoT offloads task-specific formats into the soft prompt that can be removed when doing other in-context tasks. We fine-tune mT5 XXL with ProMoT on natural language inference (NLI) and English-French translation and evaluate the in-context abilities of the resulting models on 8 different NLP tasks. ProMoT achieves similar performance on the fine-tuned tasks compared with vanilla fine-tuning, but with much less reduction of in-context learning performances across the board. More importantly, ProMoT shows remarkable generalization ability on tasks that have different formats, e.g. fine-tuning on a NLI binary classification task improves the model's in-context ability to do summarization (+0.53 Rouge-2 score compared to the pretrained model), making ProMoT a promising method to build general purpose capabilities such as grounding and reasoning into LLMs with small but high quality datasets. When extended to sequential or multi-task training, ProMoT can achieve even better out-of-domain generalization performance.


page 1

page 2

page 3

page 4


Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

In this note, we explore inference-time alignment through in-context lea...

TART: A plug-and-play Transformer module for task-agnostic reasoning

Large language models (LLMs) exhibit in-context learning abilities which...

Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC

The purpose of this project is to evaluate three language models named B...

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Despite the rapid recent progress in creating accurate and compact in-co...

CoreLM: Coreference-aware Language Model Fine-Tuning

Language Models are the underpin of all modern Natural Language Processi...

Exploring and Evaluating Personalized Models for Code Generation

Large Transformer models achieved the state-of-the-art status for Natura...

Please sign up or login with your details

Forgot password? Click here to reset