You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments

05/31/2022
by   Keiran Paster, et al.
0

Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.

READ FULL TEXT
research
10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...
research
06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...
research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
10/11/2022

ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

The goal of offline reinforcement learning (RL) is to learn near-optimal...
research
02/10/2023

Long-Context Language Decision Transformers and Exponential Tilt for Interactive Text Environments

Text-based game environments are challenging because agents must deal wi...
research
05/26/2023

Future-conditioned Unsupervised Pretraining for Decision Transformer

Recent research in offline reinforcement learning (RL) has demonstrated ...
research
03/15/2012

Parametric Return Density Estimation for Reinforcement Learning

Most conventional Reinforcement Learning (RL) algorithms aim to optimize...

Please sign up or login with your details

Forgot password? Click here to reset