Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

06/06/2021
by   Thibaut Théate, et al.
27

The distributional reinforcement learning (RL) approach advocates for representing the complete probability distribution of the random return instead of only modelling its expectation. A distributional RL algorithm may be characterised by two main components, namely the representation and parameterisation of the distribution and the probability metric defining the loss. This research considers the unconstrained monotonic neural network (UMNN) architecture, a universal approximator of continuous monotonic functions which is particularly well suited for modelling different representations of a distribution (PDF, CDF, quantile function). This property enables the decoupling of the effect of the function approximator class from that of the probability metric. The paper firstly introduces a methodology for learning different representations of the random return distribution. Secondly, a novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented. Lastly, in light of this new algorithm, an empirical comparison is performed between three probability quasimetrics, namely the Kullback-Leibler divergence, Cramer distance and Wasserstein distance. The results call for a reconsideration of all probability metrics in distributional RL, which contrasts with the dominance of the Wasserstein distance in recent publications.

READ FULL TEXT
research
02/01/2022

Distributional Reinforcement Learning via Sinkhorn Iterations

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
10/07/2021

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
09/29/2022

How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

We consider the problem of learning a set of probability distributions f...
research
11/05/2019

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Distributional Reinforcement Learning (RL) differs from traditional RL i...
research
07/24/2020

Distributional Reinforcement Learning with Maximum Mean Discrepancy

Distributional reinforcement learning (RL) has achieved state-of-the-art...
research
05/20/2018

Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) ...
research
06/12/2023

Diverse Projection Ensembles for Distributional Reinforcement Learning

In contrast to classical reinforcement learning, distributional reinforc...

Please sign up or login with your details

Forgot password? Click here to reset