Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

10/26/2021
by   Siyuan Zhang, et al.
0

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) – which is crucial for hyperpa-rameter tuning – is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari. To address performance degradation due to poor critics in continuous-action domains, we further combine BVFT with OPE to get the best of both worlds, and obtain a hyperparameter-tuning method for Q-function based OPE with theoretical guarantees as a side product.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2020

Hyperparameter Selection for Offline Reinforcement Learning

Offline reinforcement learning (RL purely from logged data) is an import...
research
01/07/2022

A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation

We are concerned with the problem of hyperparameter selection of offline...
research
06/13/2021

Bellman-consistent Pessimism for Offline Reinforcement Learning

The use of pessimism, when reasoning about datasets lacking exhaustive e...
research
06/13/2023

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Policy-based algorithms are among the most widely adopted techniques in ...
research
03/24/2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning

We introduce a new reinforcement learning principle that approximates th...
research
05/10/2021

Parameter-free Gradient Temporal Difference Learning

Reinforcement learning lies at the intersection of several challenges. M...
research
10/28/2020

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

Despite empirical success, the theory of reinforcement learning (RL) wit...

Please sign up or login with your details

Forgot password? Click here to reset