A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

07/18/2022
by   Philip Amortila, et al.
0

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires 𝒪̃(d) expert queries and a (d,H,|𝒜|,1/ε) amount of exploratory samples to provably recover an ε-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning (IL) approaches, our required number of expert demonstrations is independent of H and logarithmic in 1/ε, whereas all prior work required at least linear factors of both in addition to the same dependence on d. Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of d,H, and |𝒜|) will require at least Ω̃(√(d)) oracle calls to recover a policy competing with the expert's value function. Under the weaker assumption that the expert's policy is linear, we show that the lower bound increases to Ω̃(d).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2023

Inverse Reinforcement Learning without Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful set of techniques for...
research
11/14/2022

Linear Reinforcement Learning with Ball Structure Action Space

We study the problem of Reinforcement Learning (RL) with linear function...
research
02/03/2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

We consider the problem of local planning in fixed-horizon Markov Decisi...
research
07/18/2022

Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...
research
03/23/2021

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

A fundamental question in the theory of reinforcement learning is: suppo...
research
06/17/2023

Active Policy Improvement from Multiple Black-box Oracles

Reinforcement learning (RL) has made significant strides in various comp...
research
06/07/2021

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

In this paper, we consider a transfer Reinforcement Learning (RL) proble...

Please sign up or login with your details

Forgot password? Click here to reset