Lima: Less Is More for Alignment

By Chunting Zhou et al
Published on May 18, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Alignment Data
3 Training LIMA
4 Human Evaluation
5 Why is Less More? Ablations on Data Diversity, Quality, and Quantity

Summary

Large language models are trained in two stages: unsupervised pretraining from raw text to learn general-purpose representations, and large-scale instruction tuning and reinforcement learning to better align to end tasks and user preferences. The document discusses the relative importance of these two stages by presenting the results of training LIMA, a language model fine-tuned with supervised loss on 1,000 prompts and responses, demonstrating strong performance. The study also evaluates LIMA against state-of-the-art models and products, showing promising results. Additionally, the document analyzes the effects of data diversity, quality, and quantity on model performance.
×
This is where the content will go.