A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

11/02/2020
by   Philip Amortila, et al.
0

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset