Preserving In-Context Learning ability in Large Language Model Fine-tuning

11/01/2022
by   Yihan Wang, et al.
5

Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-shot learning without changing model parameters. However, as we show, fine-tuning an LLM on any specific task generally destroys its in-context ability. We discover an important cause of this loss, format specialization, where the model overfits to the format of the fine-tuned task and is unable to output anything beyond this format. We further show that format specialization happens at the beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that preserves in-context abilities of the pretrained model. ProMoT first trains a soft prompt for the fine-tuning target task, and then fine-tunes the model itself with this soft prompt attached. ProMoT offloads task-specific formats into the soft prompt that can be removed when doing other in-context tasks. We fine-tune mT5 XXL with ProMoT on natural language inference (NLI) and English-French translation and evaluate the in-context abilities of the resulting models on 8 different NLP tasks. ProMoT achieves similar performance on the fine-tuned tasks compared with vanilla fine-tuning, but with much less reduction of in-context learning performances across the board. More importantly, ProMoT shows remarkable generalization ability on tasks that have different formats, e.g. fine-tuning on a NLI binary classification task improves the model's in-context ability to do summarization (+0.53 Rouge-2 score compared to the pretrained model), making ProMoT a promising method to build general purpose capabilities such as grounding and reasoning into LLMs with small but high quality datasets. When extended to sequential or multi-task training, ProMoT can achieve even better out-of-domain generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...
research
08/08/2023

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

In this note, we explore inference-time alignment through in-context lea...
research
06/13/2023

TART: A plug-and-play Transformer module for task-agnostic reasoning

Large language models (LLMs) exhibit in-context learning abilities which...
research
01/15/2021

Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC

The purpose of this project is to evaluate three language models named B...
research
04/04/2023

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Despite the rapid recent progress in creating accurate and compact in-co...
research
11/04/2021

CoreLM: Coreference-aware Language Model Fine-Tuning

Language Models are the underpin of all modern Natural Language Processi...
research
08/29/2022

Exploring and Evaluating Personalized Models for Code Generation

Large Transformer models achieved the state-of-the-art status for Natura...

Please sign up or login with your details

Forgot password? Click here to reset