Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions
This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations. Firstly, we develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of Õ(H^3/2 |S|/ε) under the known transition setting, where H is the planning horizon, |S| is the state space size and ε is desired policy value gap. This improves upon the previous best bound of Õ(H^2 |S| / ε^2) for AIL methods and matches the lower bound of Ω̃ (H^3/2 |S|/ε) in [Rajaraman et al., 2021] up to a logarithmic factor. The key ingredient of TAIL is a fine-grained estimator for expert state-action distribution, which explicitly utilizes the transition function information. Secondly, considering practical settings where the transition functions are usually unknown but environment interaction is allowed, we accordingly develop a model-based transition-aware AIL algorithm named MB-TAIL. In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model. The interaction complexity of MB-TAIL is Õ (H^3 |S|^2 |A| / ε^2), which improves the best known result of Õ (H^4 |S|^2 |A| / ε^2) in [Shani et al., 2021]. Finally, our theoretical results are supported by numerical evaluation and detailed analysis on two challenging MDPs.
READ FULL TEXT