Near-Optimal Adversarial Reinforcement Learning with Switching Costs

by   Ming Shi, et al.

Switching costs, which capture the costs for changing policies, are regarded as a critical metric in reinforcement learning (RL), in addition to the standard metric of losses (or rewards). However, existing studies on switching costs (with a coefficient β that is strictly positive and is independent of T) have mainly focused on static RL, where the loss distribution is assumed to be fixed during the learning process, and thus practical scenarios where the loss distribution could be non-stationary or even adversarial are not considered. While adversarial RL better models this type of practical scenarios, an open problem remains: how to develop a provably efficient algorithm for adversarial RL with switching costs? This paper makes the first effort towards solving this problem. First, we provide a regret lower-bound that shows that the regret of any algorithm must be larger than Ω̃( ( H S A )^1/3 T^2/3 ), where T, S, A and H are the number of episodes, states, actions and layers in each episode, respectively. Our lower bound indicates that, due to the fundamental challenge of switching costs in adversarial RL, the best achieved regret (whose dependency on T is Õ(√(T))) in static RL with switching costs (as well as adversarial RL without switching costs) is no longer achievable. Moreover, we propose two novel switching-reduced algorithms with regrets that match our lower bound when the transition function is known, and match our lower bound within a small factor of Õ( H^1/3 ) when the transition function is unknown. Our regret analysis demonstrates the near-optimal performance of them.


page 1

page 2

page 3

page 4


Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

In many real-life reinforcement learning (RL) problems, deploying new po...

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically im...

Better Best of Both Worlds Bounds for Bandits with Switching Costs

We study best-of-both-worlds algorithms for bandits with switching cost,...

Anomaly Search Over Many Sequences With Switching Costs

This paper considers the quickest search problem to identify anomalies a...

A Benchmark for Low-Switching-Cost Reinforcement Learning

A ubiquitous requirement in many practical reinforcement learning (RL) a...

A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances

Reinforcement Learning (RL) is limited in practice by its gray-box natur...

Please sign up or login with your details

Forgot password? Click here to reset