Refined Value-Based Offline RL under Realizability and Partial Coverage

02/05/2023
by   Masatoshi Uehara, et al.
0

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with L^2-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2022

Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps

We consider a challenging theoretical problem in offline reinforcement l...
research
11/23/2022

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear functio...
research
02/09/2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Sample-efficiency guarantees for offline reinforcement learning (RL) oft...
research
04/25/2023

Provable benefits of general coverage conditions in efficient online RL with function approximation

In online reinforcement learning (RL), instead of employing standard str...
research
11/01/2022

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Offline reinforcement learning (RL), which refers to decision-making fro...
research
07/13/2021

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...
research
12/20/2019

Soft Q-network

When DQN is announced by deepmind in 2013, the whole world is surprised ...

Please sign up or login with your details

Forgot password? Click here to reset