How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

09/29/2022
by   Ke Sun, et al.
0

We consider the problem of learning a set of probability distributions from the Bellman dynamics in distributional reinforcement learning (RL) that learns the whole return distribution compared with only its expectation in classical RL. Despite its success to obtain superior performance, we still have a poor understanding of how the value distribution in distributional RL works. In this study, we analyze the optimization benefits of distributional RL by leverage of additional value distribution information over classical RL in the Neural Fitted Z-Iteration (Neural FZI) framework. To begin with, we demonstrate that the distribution loss of distributional RL has desirable smoothness characteristics and hence enjoys stable gradients, which is in line with its tendency to promote optimization stability. Furthermore, the acceleration effect of distributional RL is revealed by decomposing the return distribution. It turns out that distributional RL can perform favorably if the value distribution approximation is appropriate, measured by the variance of gradient estimates in each environment for any specific distributional RL algorithm. Rigorous experiments validate the stable optimization behaviors of distributional RL, contributing to its acceleration effects compared to classical RL. The findings of our research illuminate how the value distribution in distributional RL algorithms helps the optimization.

READ FULL TEXT

page 8

page 19

research
10/07/2021

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
10/26/2021

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

A growing trend for value-based reinforcement learning (RL) algorithms i...
research
05/25/2023

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

While distributional reinforcement learning (RL) has demonstrated empiri...
research
03/20/2021

Bayesian Distributional Policy Gradients

Distributional Reinforcement Learning (RL) maintains the entire probabil...
research
06/06/2021

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

The distributional reinforcement learning (RL) approach advocates for re...
research
09/17/2021

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations

In real scenarios, state observations that an agent observes may contain...
research
08/03/2023

Bag of Policies for Distributional Deep Exploration

Efficient exploration in complex environments remains a major challenge ...

Please sign up or login with your details

Forgot password? Click here to reset