Proximal Policy Optimization Algorithms

By John Schulman et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 Background: Policy Optimization
3 Clipped Surrogate Objective
4 Adaptive KL Penalty Coefficient
5 Algorithm
6 Experiments
7 Conclusion
8 Acknowledgements
9 References

Summary

The document discusses the Proximal Policy Optimization (PPO) Algorithms, a family of policy gradient methods for reinforcement learning. It introduces a new approach that alternates between sampling data and optimizing a surrogate objective function using stochastic gradient ascent. The methods show benefits over traditional policy gradient methods and trust region policy optimization. The experiments demonstrate the performance of PPO on benchmark tasks including robotic locomotion and Atari game playing, showcasing improved sample complexity, simplicity, and wall-time efficiency.
×
This is where the content will go.