DeepAI AI Chat
Log In Sign Up

Outcome-Driven Reinforcement Learning via Variational Inference

by   Tim G. J. Rudner, et al.

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator reminiscent of the standard Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.


page 7

page 9

page 31


MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Exploration in reinforcement learning is a challenging problem: in the w...

A Comparison of Reward Functions in Q-Learning Applied to a Cart Position Problem

Growing advancements in reinforcement learning has led to advancements i...

Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

Reinforcement Learning or optimal control can provide effective reasonin...

Learning Independently-Obtainable Reward Functions

We present a novel method for learning a set of disentangled reward func...

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via ...

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

The goal of this paper is to provide a unifying view of a wide range of ...