LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

by   Longteng Zhang, et al.
Harbin Institute of Technology
Hong Kong Baptist University
The Hong Kong University of Science and Technology

The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of A and update the projection-up weight of B in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4× compared to LoRA.


Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Adapting large-scale pretrained language models to downstream tasks via ...

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

We introduce a method that dramatically reduces fine-tuning VRAM require...

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Large Language Models (LLMs) have revolutionized Natural Language Proces...

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

During the last stage of RLHF, a large language model is aligned to huma...

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Prompt tuning (PT), where a small amount of trainable soft (continuous) ...

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (P...

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

With the burgeoning development in the realm of large language models (L...

Please sign up or login with your details

Forgot password? Click here to reset