Contrastive Example-Based Control

07/24/2023
by   Kyle Hatch, et al.
0

While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.

READ FULL TEXT

page 5

page 7

page 8

page 10

page 18

research
11/03/2022

Contrastive Value Learning: Implicit Models for Simple Offline RL

Model-based reinforcement learning (RL) methods are appealing in the off...
research
03/23/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...
research
01/29/2018

Learning the Reward Function for a Misspecified Model

In model-based reinforcement learning it is typical to treat the problem...
research
04/06/2023

Robust Decision-Focused Learning for Reward Transfer

Decision-focused (DF) model-based reinforcement learning has recently be...
research
01/25/2022

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...
research
11/28/2017

Hierarchical Policy Search via Return-Weighted Density Estimation

Learning an optimal policy from a multi-modal reward function is a chall...
research
06/24/2020

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

We propose a simple, practical, and intuitive approach for domain adapta...

Please sign up or login with your details

Forgot password? Click here to reset