Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning

11/06/2022
by   Dan Elbaz, et al.
0

Offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without the possibility of additional online data collection. This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment. However, this promise also bears the drawback of this setting. The restricted dataset induces subjective uncertainty because the agent can encounter unfamiliar sequences of states and actions that the training data did not cover. Moreover, inherent system stochasticity further increases uncertainty and aggravates the offline RL problem, preventing the agent from learning an optimal policy. To mitigate the destructive uncertainty effects, we need to balance the aspiration to take reward-maximizing actions with the incurred risk due to incorrect ones. In financial economics, modern portfolio theory (MPT) is a method that risk-averse investors can use to construct diversified portfolios that maximize their returns without unacceptable levels of risk. We integrate MPT into the agent's decision-making process to present a simple-yet-highly-effective risk-aware planning algorithm for offline RL. Our algorithm allows us to systematically account for the estimated quality of specific actions and their estimated risk due to the uncertainty. We show that our approach can be coupled with the Transformer architecture to yield a state-of-the-art planner for offline RL tasks, maximizing the return while significantly reducing the variance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect...
research
12/05/2022

Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation

Amazon and other e-commerce sites must employ mechanisms to protect thei...
research
07/12/2021

Conservative Offline Distributional Reinforcement Learning

Many reinforcement learning (RL) problems in practice are offline, learn...
research
11/09/2022

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) co...
research
10/24/2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...
research
02/22/2021

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

Offline reinforcement learning approaches can generally be divided to pr...
research
11/21/2022

Data-Driven Offline Decision-Making via Invariant Representation Learning

The goal in offline data-driven decision-making is synthesize decisions ...

Please sign up or login with your details

Forgot password? Click here to reset