Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

10/07/2020
by   Weichao Mao, et al.
2

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition distributions are allowed to vary over time, either gradually or abruptly, as long as their cumulative variation magnitude does not exceed certain budgets. We propose an algorithm, named Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), for this setting, which adopts a simple restarting strategy and an extra optimism term. Our algorithm outperforms the state-of-the-art (model-based) solution in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret of O(S^1/3 A^1/3Δ^1/3 H T^2/3), where S and A are the numbers of states and actions, respectively, Δ>0 is the variation budget, H is the number of steps per episode, and T is the total number of steps. We further show that our algorithm is near-optimal by establishing an information-theoretical lower bound of Ω(S^1/3 A^1/3Δ^1/3 H^2/3 T^2/3), which to the best of our knowledge is the first impossibility result in non-stationary RL in general.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2022

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design

We study risk-sensitive reinforcement learning (RL) based on an entropic...
research
06/01/2023

Non-stationary Reinforcement Learning under General Function Approximation

General function approximation is a powerful tool to handle large state ...
research
06/30/2020

Dynamic Regret of Policy Optimization in Non-stationary Environments

We consider reinforcement learning (RL) in episodic MDPs with adversaria...
research
03/10/2023

Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

We study model-free reinforcement learning (RL) algorithms in episodic n...
research
04/01/2023

Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

We consider the problem of learning in a non-stationary reinforcement le...
research
05/20/2021

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning a...
research
10/18/2019

Autonomous exploration for navigating in non-stationary CMPs

We consider a setting in which the objective is to learn to navigate in ...

Please sign up or login with your details

Forgot password? Click here to reset