Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

04/01/2023
by   Réda Alami, et al.
0

We consider the problem of learning in a non-stationary reinforcement learning (RL) environment, where the setting can be fully described by a piecewise stationary discrete-time Markov decision process (MDP). We introduce a variant of the Restarted Bayesian Online Change-Point Detection algorithm (R-BOCPD) that operates on input streams originating from the more general multinomial distribution and provides near-optimal theoretical guarantees in terms of false-alarm rate and detection delay. Based on this, we propose an improved version of the UCRL2 algorithm for MDPs with state transition kernel sampled from a multinomial distribution, which we call R-BOCPD-UCRL2. We perform a finite-time performance analysis and show that R-BOCPD-UCRL2 enjoys a favorable regret bound of O(D O √(A T K_T log (T/δ) + K_T logK_T/δ/min_ℓ 𝐊𝐋( θ^(ℓ+1)||θ^(ℓ)))), where D is the largest MDP diameter from the set of MDPs defining the piecewise stationary MDP setting, O is the finite number of states (constant over all changes), A is the finite number of actions (constant over all changes), K_T is the number of change points up to horizon T, and θ^(ℓ) is the transition kernel during the interval [c_ℓ, c_ℓ+1), which we assume to be multinomially distributed over the set of states 𝕆. Interestingly, the performance bound does not directly scale with the variation in MDP state transition distributions and rewards, ie. can also model abrupt changes. In practice, R-BOCPD-UCRL2 outperforms the state-of-the-art in a variety of scenarios in synthetic environments. We provide a detailed experimental setup along with a code repository (upon publication) that can be used to easily reproduce our experiments.

READ FULL TEXT
research
05/14/2019

Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision proce...
research
05/20/2021

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning a...
research
10/07/2020

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

We consider model-free reinforcement learning (RL) in non-stationary Mar...
research
06/05/2017

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

Non-stationary domains, that change in unpredicted ways, are a challenge...
research
04/06/2019

A Bayesian Theory of Change Detection in Statistically Periodic Random Processes

A new class of stochastic processes called independent and periodically ...
research
01/21/2022

Meta Learning MDPs with Linear Transition Models

We study meta-learning in Markov Decision Processes (MDP) with linear tr...
research
10/18/2019

Autonomous exploration for navigating in non-stationary CMPs

We consider a setting in which the objective is to learn to navigate in ...

Please sign up or login with your details

Forgot password? Click here to reset