Dichotomy of Control: Separating What You Can Control from What You Cannot

10/24/2022
by   Mengjiao Yang, et al.
0

Future- or return-conditioned supervised learning is an emerging paradigm for offline reinforcement learning (RL), where the future outcome (i.e., return) associated with an observed action sequence is used as input to a policy trained to imitate those same actions. While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves. Such situations can lead to a learned policy that is inconsistent with its conditioning inputs; i.e., using the policy to act in the environment, when conditioning on a specific desired return, leads to a distribution of real returns that is wildly different than desired. In this work, we propose the dichotomy of control (DoC), a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environment stochasticity). We achieve this separation by conditioning the policy on a latent variable representation of the future, and designing a mutual information constraint that removes any information from the latent variable associated with randomness in the environment. Theoretically, we show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior. Empirically, we show that DoC is able to achieve significantly better performance than DT on environments that have highly stochastic rewards and transition

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments

Recently, methods such as Decision Transformer that reduce reinforcement...
research
06/02/2022

When does return-conditioned supervised learning work for offline reinforcement learning?

Several recent works have proposed a class of algorithms for the offline...
research
05/26/2023

Future-conditioned Unsupervised Pretraining for Decision Transformer

Recent research in offline reinforcement learning (RL) has demonstrated ...
research
09/12/2023

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Decision Transformer (DT), which employs expressive sequence modeling te...
research
10/11/2022

ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

The goal of offline reinforcement learning (RL) is to learn near-optimal...
research
07/04/2022

Goal-Conditioned Generators of Deep Policies

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal po...
research
02/10/2023

Long-Context Language Decision Transformers and Exponential Tilt for Interactive Text Environments

Text-based game environments are challenging because agents must deal wi...

Please sign up or login with your details

Forgot password? Click here to reset