Lora: Low-Rank Adaptation of Large Language Models
By Edward Hu et al
Published on Oct. 16, 2021
Read the original document by opening this link in a new tab.
Table of Contents
1. Introduction
2. Problem Statement
3. Existing Solutions
4. Our Method
5. Empirical Experiments
6. Baselines
Summary
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, we propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency.