Reinforcement Learning in Presence of Discrete Markovian Context Evolution

02/14/2022
by   Hang Ren, et al.
0

We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.

READ FULL TEXT
research
10/25/2022

In-context Reinforcement Learning with Algorithm Distillation

We propose Algorithm Distillation (AD), a method for distilling reinforc...
research
02/09/2022

Contextualize Me – The Case for Context in Reinforcement Learning

While Reinforcement Learning (RL) has made great strides towards solving...
research
09/01/2022

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

A key challenge of continual reinforcement learning (CRL) in dynamic env...
research
03/03/2022

Reinforcement Learning in Possibly Nonstationary Environments

We consider reinforcement learning (RL) methods in offline nonstationary...
research
12/24/2022

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

One of the key challenges in deploying RL to real-world applications is ...
research
03/16/2023

Recommending the optimal policy by learning to act from temporal data

Prescriptive Process Monitoring is a prominent problem in Process Mining...
research
04/26/2019

Reinforcement Learning Based Orchestration for Elastic Services

Due to the highly variable execution context in which edge services run,...

Please sign up or login with your details

Forgot password? Click here to reset