Optimal Control as Variational Inference

by   Tom Lefebvre, et al.
Ghent University

In this article we address the stochastic and risk sensitive optimal control problem probabilistically and decompose and solve the probabilistic models using principles from variational inference. We demonstrate how this culminates into two separate probabilistic inference procedures that allow to iteratively infer the deterministic optimal policy. More formally a sequence of belief policies, as a probabilistic proxy for the deterministic optimal policy, is specified through a fixed point iteration with the equilibrium point coinciding with the deterministic solution. These results re-establish the paradigm of Control as Inference, a concept explored and exploited originally by the Reinforcement Learning community anticipating deep rooted connections between optimal estimation and control. Although the Control as Inference paradigm already resulted in the development of several Reinforcement Learning algorithms, until now the underlying mechanism were only partially understood. For that very reason control as inference has not been well received by the control community. By exposing the underlying mechanism we aim to contribute to its general acceptance as a framework superseding optimal control. In order to exhibit its general relevance we discuss parallels with path integral control and discuss a wide range of possible applications.


page 6

page 8

page 9

page 12

page 15

page 18

page 20

page 22


Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

The framework of reinforcement learning or optimal control provides a ma...

On Entropic Optimization and Path Integral Control

This article is motivated by the question whether it is possible to solv...

Variational Inference MPC using Tsallis Divergence

In this paper, we provide a generalized framework for Variational Infere...

Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

Reinforcement Learning or optimal control can provide effective reasonin...

CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies

Reinforcement Learning has drawn huge interest as a tool for solving opt...

Sparsity Inducing Representations for Policy Decompositions

Policy Decomposition (PoDec) is a framework that lessens the curse of di...

Please sign up or login with your details

Forgot password? Click here to reset