Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

By Haoxuan Pan et al.
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 Related Work
3 Preliminaries
4 Bias in Policy Gradient Estimation

Summary

The paper revisits the estimation bias in policy gradients for deep reinforcement learning, focusing on the discounted episodic Markov Decision Process. It discusses the state distribution shift and its impact on policy optimization in neural network parameterization. The study extends previous discussions on bias reduction techniques such as learning rate adjustment, adaptive optimizers, and KL regularization. Extensive experiments on continuous control tasks support the analysis, highlighting the importance of unbiased policy gradients for optimal outcomes.
×
This is where the content will go.