Summary
Humans have a powerful and mysterious capacity to reason. Working through a set of mental steps enables us to make inferences we would not be capable of making directly even though we get no additional data from the world. This document investigates the usefulness of chain-of-thought reasoning in language models, particularly when training data consists of overlapping local clusters of variables that influence each other strongly. The combination of locally structured observations and reasoning is much more data-efficient than training on all variables. The document also presents theoretical results and experimental methods to explore the effectiveness of different estimators in conditional inference accuracy.