Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
By Haoxuan Pan et al.
Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1 Introduction
2 Related Work
3 Preliminaries
4 Bias in Policy Gradient Estimation
Summary
The paper revisits the estimation bias in policy gradients for deep reinforcement learning, focusing on the discounted episodic Markov Decision Process. It discusses the state distribution shift and its impact on policy optimization in neural network parameterization. The study extends previous discussions on bias reduction techniques such as learning rate adjustment, adaptive optimizers, and KL regularization. Extensive experiments on continuous control tasks support the analysis, highlighting the importance of unbiased policy gradients for optimal outcomes.