Reinforced Self-Training (ReST) for Language Modeling

By Caglar Gulcehre et al.
Published on Aug. 22, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Reinforced Self-Training (ReST)
3. Experiments and analysis

Summary

Reinforced Self-Training (ReST) is proposed as a method to align large language models with human preferences using reinforcement learning from human feedback. The approach involves two main steps: Grow and Improve, where the model is trained on a fixed dataset and then fine-tuned with new samples generated by the model itself. Experiments on machine translation benchmarks show that ReST significantly improves translation quality in a computationally efficient manner, outperforming standard supervised learning. The method involves multiple Grow and Improve steps, with each step leading to improved performance on validation datasets. The results also indicate that additional Grow steps further enhance the model's performance. Overall, ReST proves to be a promising approach for improving language models with human feedback.
×
This is where the content will go.