Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

by   Yanran Wang, et al.
Imperial College London

Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.


Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

The framework of reinforcement learning or optimal control provides a ma...

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of...

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning

We model human decision-making behaviors in a risk-taking task using inv...

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem

Inverse reinforcement learning (IRL) is used to infer the reward functio...

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

We propose SLTD (`Sequential Learning-to-Defer') a framework for learnin...

Optimal Control as Variational Inference

In this article we address the stochastic and risk sensitive optimal con...

Please sign up or login with your details

Forgot password? Click here to reset