Summary
The paper discusses the surprising ability of Large Language Models (LLMs) to perform complex reasoning tasks with only a few-shot prompts. It proposes model specialization to distill these abilities from large models to smaller ones. The focus is on improving smaller models' performance on multi-step math reasoning tasks. The experiments aim to show how concentrating a model's capacity on a target ability can lift the scaling curve of smaller models. The paper also addresses challenges such as aligning tokenizers and the tradeoff between generic and specialized abilities during model specialization.