How to Enable Uncertainty Estimation in Proximal Policy Optimization

10/07/2022
by   Eugene Bykovets, et al.
12

While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents.

READ FULL TEXT

page 6

page 19

page 29

page 30

page 31

page 32

page 34

page 36

research
06/03/2022

Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning

Characterizing aleatoric and epistemic uncertainty on the predicted rewa...
research
07/17/2023

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

We study robust reinforcement learning (RL) with the goal of determining...
research
09/21/2022

Model-Free Reinforcement Learning for Asset Allocation

Asset allocation (or portfolio management) is the task of determining ho...
research
05/07/2021

Reward prediction for representation learning and reward shaping

One of the fundamental challenges in reinforcement learning (RL) is the ...
research
05/09/2023

Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

The nuclear fuel loading pattern optimization problem has been studied s...
research
01/05/2022

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

In model-free deep reinforcement learning (RL) algorithms, using noisy v...
research
07/26/2019

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Empowerment is an information-theoretic method that can be used to intri...

Please sign up or login with your details

Forgot password? Click here to reset