Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

08/19/2021
by   Andrea Zanette, et al.
0

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a...
research
02/05/2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free a...
research
03/02/2021

Offline Reinforcement Learning with Pseudometric Learning

Offline Reinforcement Learning methods seek to learn a policy from logge...
research
10/28/2019

Better Exploration with Optimistic Actor-Critic

Actor-critic methods, a type of model-free Reinforcement Learning, have ...
research
10/13/2021

Adapting to Dynamic LEO-B5G Systems: Meta-Critic Learning Based Efficient Resource Scheduling

Low earth orbit (LEO) satellite-assisted communications have been consid...
research
12/04/2020

A Hierarchical Deep Actor-Critic Learning Method for Joint Distribution System State Estimation

Due to increasing penetration of volatile distributed photovoltaic (PV) ...
research
06/20/2023

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained...

Please sign up or login with your details

Forgot password? Click here to reset