Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

02/25/2021
by   Nived Rajaraman, et al.
0

We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space 𝒮. We focus on the known-transition setting where the learner is provided a dataset of N length-H trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound O(|𝒮|H^3/2/N) for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient. In contrast, we show the minimax suboptimality grows as Ω( H^3/2/N) when |𝒮|≥ 3 while the unknown-transition setting suffers from a larger sharp rate Θ(|𝒮|H^2/N) (Rajaraman et al (2020)). The lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as building connections with linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality O(1/N) for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality O(√(H)/N) with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting, while Rajaraman et al (2020) showed it does not help when transitions are unknown.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2020

Toward the Fundamental Limits of Imitation Learning

Imitation learning (IL) aims to mimic the behavior of an expert policy i...
research
06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...
research
12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...
research
06/11/2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

Imitation learning (IL) has proven to be an effective method for learnin...
research
11/05/2019

Apprenticeship Learning via Frank-Wolfe

We consider the applications of the Frank-Wolfe (FW) algorithm for Appre...
research
04/24/2018

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

We formalize the problem of maximizing the mean-payoff value with high p...
research
10/05/2021

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a gener...

Please sign up or login with your details

Forgot password? Click here to reset