Constrained Policy Optimization via Bayesian World Models

01/24/2022
by   Yarden As, et al.
0

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.

READ FULL TEXT

page 2

page 18

research
08/15/2020

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...
research
10/10/2020

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement lea...
research
10/20/2022

Safe Policy Improvement in Constrained Markov Decision Processes

The automatic synthesis of a policy through reinforcement learning (RL) ...
research
11/10/2022

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinfor...
research
10/15/2020

Certifying Neural Network Robustness to Random Input Noise from Samples

Methods to certify the robustness of neural networks in the presence of ...
research
11/15/2019

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the onli...

Please sign up or login with your details

Forgot password? Click here to reset