Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

by   Andrew Wagenmaker, et al.

Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than PAC RL – where the agent has access to the reward function during exploration – with optimal sample complexities in the two settings differing by a factor of |𝒮|, the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a d-dimensional linear MDP with sample complexity scaling as 𝒪(d^2/ϵ^2). We then show a matching lower bound of Ω(d^2/ϵ^2) on PAC RL. To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal d dependence in linear MDPs, even in the single-reward PAC setting. Our algorithm relies on a novel procedure which efficiently traverses a linear MDP, collecting samples in any given "feature direction", and enjoys a sample complexity scaling optimally in the (linear MDP equivalent of the) maximal state visitation probability. We show that this exploration procedure can also be applied to solve the problem of obtaining "well-conditioned" covariates in linear MDPs.


page 1

page 2

page 3

page 4


Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

We study the model-based reward-free reinforcement learning with linear ...

Active Coverage for PAC Reinforcement Learning

Collecting and leveraging data with good coverage properties plays a cru...

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

There has been growing progress on theoretical analyses for provably eff...

Exploring compact reinforcement-learning representations with linear regression

This paper presents a new algorithm for online linear regression whose e...

Markov Decision Processes with Continuous Side Information

We consider a reinforcement learning (RL) setting in which the agent int...

Fast Rates for Maximum Entropy Exploration

We consider the reinforcement learning (RL) setting, in which the agent ...

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

The key assumption underlying linear Markov Decision Processes (MDPs) is...

Please sign up or login with your details

Forgot password? Click here to reset