Solving Math Word Problems with Process- and Outcome-Based Feedback

By Jonathan Uesato et al
Published on Nov. 28, 2022
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Problem and Methods
2.1. Dataset and Evaluation Metrics
2.2. Training: Overview
2.3. Supervised Finetuning
2.4. Reward Models
2.5. Decoding
2.6. RL via Expert Iteration
2.7. Data Annotation

Summary

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. This paper investigates the comparison between process- and outcome-based approaches in training language models on a natural language task. The study focuses on the GSM8K dataset of math word problems and evaluates different modeling components including supervised finetuning, reward models, decoding strategies, and reinforcement learning via expert iteration. The key findings suggest that a combination of supervised learning with reward-model-based reinforcement learning significantly improves error rates. The study also highlights the importance of process-based feedback in reducing trace errors. Overall, the research provides insights into the effectiveness of different feedback approaches in training language models for reasoning tasks.
×
This is where the content will go.