VIREL: A Variational Inference Framework for Reinforcement Learning

by   Matthew Fellows, et al.

Applying probabilistic models to reinforcement learning (RL) has become an exciting direction of research owing to powerful optimisation tools such as variational inference becoming applicable to RL. However, due to their formulation, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, for example, the absence of mode capturing behaviour in pseudo-likelihood methods and difficulties in optimisation of learning objective in maximum entropy RL based approaches. We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises the action-value function in a parametrised form to capture future dynamics of the underlying Markov decision process. Owing to it's generality, our framework lends itself to current advances in variational inference. Applying the variational expectation-maximisation algorithm to our framework, we show that actor-critic algorithm can be reduced to expectation-maximization. We derive a family of methods from our framework, including state-of-the-art methods based on soft value functions. We evaluate two actor-critic algorithms derived from this family, which perform on par with soft actor critic, demonstrating that our framework offers a promising perspective on RL as inference.


TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control

Due to their complex nonlinear dynamics and batch-to-batch variability, ...

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

The goal of this paper is to provide a unifying view of a wide range of ...

Reinforcement learning for automatic quadrilateral mesh generation: a soft actor-critic approach

This paper proposes, implements, and evaluates a reinforcement learning ...

Variational Inference with Tail-adaptive f-Divergence

Variational inference with α-divergences has been widely used in modern ...

Reinforcement Learning Provides a Flexible Approach for Realistic Supply Chain Safety Stock Optimisation

Although safety stock optimisation has been studied for more than 60 yea...

MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks

Fast and efficient transport protocols are the foundation of an increasi...

Please sign up or login with your details

Forgot password? Click here to reset