Ql Ora: Efficient Finetuning of Quantized Llms

By Tim Dettmers et al
Published on May 23, 2023
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
Introduction
Table of Contents
Background
QL ORA Finetuning
QLORA vs. Standard Finetuning
Experimental Setup

Summary

The document presents QLORA, an efficient finetuning approach that reduces memory usage for finetuning large language models. It introduces innovations such as 4-bit NormalFloat quantization and Double Quantization to optimize performance. QLORA allows finetuning on a single GPU with reduced memory requirements. Experimental evaluations compare QLORA with standard finetuning methods across different model architectures and datasets, demonstrating its effectiveness.
×
This is where the content will go.