On the Effective Horizon of Inverse Reinforcement Learning

07/13/2023
by   Yiqing Xu, et al.
0

Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2020

Maximizing the Total Reward via Reward Tweaking

In reinforcement learning, the discount factor γ controls the agent's ef...
research
06/02/2019

On the Correctness and Sample Complexity of Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) is the problem of finding a reward ...
research
12/06/2022

State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning

In this paper we revisit endless online level generation with the recent...
research
02/14/2019

Learn a Prior for RHEA for Better Online Planning

Rolling Horizon Evolutionary Algorithms (RHEA) are a class of online pla...
research
05/18/2023

Massively Scalable Inverse Reinforcement Learning in Google Maps

Optimizing for humans' latent preferences is a grand challenge in route ...
research
07/11/2020

Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

General-purpose trajectory planning algorithms for automated driving uti...
research
09/22/2017

Inverse Reinforcement Learning with Conditional Choice Probabilities

We make an important connection to existing results in econometrics to d...

Please sign up or login with your details

Forgot password? Click here to reset