Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning

by   Yujia Qin, et al.
Tsinghua University

How can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidences indicating that the adaptations of PLMs to various tasks can be reparameterized as optimizing only a few free parameters in a common low-dimensional intrinsic task subspace, which may help us understand why PLMs could easily adapt to various NLP tasks with small-scale data. Specifically, to find such a subspace and examine its universality, we resort to the recent success of prompt tuning and decompose the soft prompts of multiple NLP tasks into the same low-dimensional nonlinear subspace, then we learn to adapt the PLM to unseen tasks or data by only tuning parameters in the subspace. We dub this pipeline as intrinsic prompt tuning (IPT). In experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 5-dimensional subspace found with 100 random tasks, by only tuning 5 free parameters, we can recover 87 for 100 seen tasks (using different training data) and 20 unseen tasks, respectively, showing great generalization ability of the found intrinsic task subspace. Besides being an analysis tool, IPT could further bring practical benefits, such as improving the prompt tuning stability.


Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...

Towards Unified Prompt Tuning for Few-shot Text Classification

Prompt-based fine-tuning has boosted the performance of Pre-trained Lang...

Different Tunes Played with Equal Skill: Exploring a Unified Optimization Subspace for Delta Tuning

Delta tuning (DET, also known as parameter-efficient tuning) is deemed a...

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Large pre-trained language models are often trained on large volumes of ...

Lifelong Learning of Few-shot Learners across NLP Tasks

Recent advances in large pre-trained language models have greatly improv...

Gradient Ascent Post-training Enhances Language Model Generalization

In this work, we empirically show that updating pretrained LMs (350M, 1....

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Although pretrained language models can be fine-tuned to produce state-o...

Please sign up or login with your details

Forgot password? Click here to reset