A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

by   Andrea Tirinzoni, et al.

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts for the dynamics of the MDP. We provide a characterization of our lower-bound through a series of examples illustrating how different MDPs may have significantly different complexity. 1) We first consider a "difficult" MDP instance, where the novel constraint based on the dynamics leads to a larger lower-bound (i.e., a larger regret) compared to the classical analysis. 2) We then show that our lower-bound recovers results previously derived for specific MDP instances. 3) Finally, we show that, in certain "simple" MDPs, the lower bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all. We show that this last result is attainable (up to poly(H) terms, where H is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.


page 1

page 2

page 3

page 4


Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

We study regret minimization for reinforcement learning (RL) in Latent M...

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

This paper presents a new model-free algorithm for episodic finite-horiz...

Regret Analysis in Deterministic Reinforcement Learning

We consider Markov Decision Processes (MDPs) with deterministic transiti...

Analysis of Lower Bounds for Simple Policy Iteration

Policy iteration is a family of algorithms that are used to find an opti...

RL for Latent MDPs: Regret Guarantees and a Lower Bound

In this work, we consider the regret minimization problem for reinforcem...

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...

Exploration in Structured Reinforcement Learning

We address reinforcement learning problems with finite state and action ...

Please sign up or login with your details

Forgot password? Click here to reset