ReLoRA: High-Rank Training Through Low-Rank Updates

By V. Lialin et al
Published on Dec. 10, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Method
3 Algorithm 1 ReLoRA
4 Results
4.1 Scaling up to 1.3B
4.2 Varying ReLoRA rank
5 Experiments

Summary

ReLoRA is a parameter-efficient training technique for large neural networks, introducing a novel method that utilizes low-rank updates to train high-rank networks. It demonstrates comparable performance to regular neural network training, saving RAM and improving training speed. The study shows the potential of parameter-efficient techniques for large-scale pre-training. ReLoRA performs high-rank updates through a sequence of low-rank updates, achieving performance similar to full-rank training. It significantly outperforms LoRA training and demonstrates effectiveness in approximating full-rank behavior. Scaling up to 1.3B, ReLoRA continues to outperform LoRA and achieves a perplexity of 17.27, only slightly higher than full-rank training. Varying the ReLoRA rank showed minimal difference between ranks 128 and 512 at the 1.3B scale. Online ReLoRA, with more frequent resets, unexpectedly performed worse than regular ReLoRA at both 250M and 1.3B scales.
×
This is where the content will go.