Outcome-Driven Reinforcement Learning via Variational Inference

04/20/2021
by   Tim G. J. Rudner, et al.
0

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator reminiscent of the standard Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.

READ FULL TEXT

page 7

page 9

page 31

research
07/15/2021

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Exploration in reinforcement learning is a challenging problem: in the w...
research
05/25/2021

A Comparison of Reward Functions in Q-Learning Applied to a Cart Position Problem

Growing advancements in reinforcement learning has led to advancements i...
research
07/13/2023

Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

Reinforcement Learning or optimal control can provide effective reasonin...
research
01/24/2019

Learning Independently-Obtainable Reward Functions

We present a novel method for learning a set of disentangled reward func...
research
07/13/2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via ...
research
03/23/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...
research
01/30/2019

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

The goal of this paper is to provide a unifying view of a wide range of ...

Please sign up or login with your details

Forgot password? Click here to reset