Read the original document by opening this link in a new tab.
Table of Contents
1 Introduction
2 Alignment Data
3 Training LIMA
4 Human Evaluation
5 Why is Less More? Ablations on Data Diversity, Quality, and Quantity
Summary
Large language models are trained in two stages: unsupervised pretraining from raw text to learn general-purpose representations, and large-scale instruction tuning and reinforcement learning to better align to end tasks and user preferences. The document discusses the relative importance of these two stages by presenting the results of training LIMA, a language model fine-tuned with supervised loss on 1,000 prompts and responses, demonstrating strong performance. The study also evaluates LIMA against state-of-the-art models and products, showing promising results. Additionally, the document analyzes the effects of data diversity, quality, and quantity on model performance.