Constrained Policy Optimization via Bayesian World Models

01/24/2022
by   Yarden As, et al.
0

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.

READ FULL TEXT

page 2

page 18

08/15/2020

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...
10/10/2020

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement lea...
10/20/2022

Safe Policy Improvement in Constrained Markov Decision Processes

The automatic synthesis of a policy through reinforcement learning (RL) ...
11/10/2022

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinfor...
10/15/2020

Certifying Neural Network Robustness to Random Input Noise from Samples

Methods to certify the robustness of neural networks in the presence of ...
11/15/2019

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the onli...

Please sign up or login with your details

Forgot password? Click here to reset