Constrained Policy Optimization via Bayesian World Models

by   Yarden As, et al.

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.


page 2

page 18


Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement lea...

Safe Policy Improvement in Constrained Markov Decision Processes

The automatic synthesis of a policy through reinforcement learning (RL) ...

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinfor...

Certifying Neural Network Robustness to Random Input Noise from Samples

Methods to certify the robustness of neural networks in the presence of ...

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the onli...

Please sign up or login with your details

Forgot password? Click here to reset