Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

06/15/2023
by   Yunfan Li, et al.
5

Policy optimization methods are powerful algorithms in Reinforcement Learning (RL) for their flexibility to deal with policy parameterization and ability to handle model misspecification. However, these methods usually suffer from slow convergence rates and poor sample complexity. Hence it is important to design provably sample efficient algorithms for policy optimization. Yet, recent advances for this problems have only been successful in tabular and linear setting, whose benign structures cannot be generalized to non-linearly parameterized policies. In this paper, we address this problem by leveraging recent advances in value-based algorithms, including bounded eluder-dimension and online sensitivity sampling, to design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation. We show that, our algorithm obtains an ε-optimal policy with only O(poly(d)/ε^3) samples, where ε is the suboptimality gap and d is a complexity measure of the function class approximating the policy. This drastically improves previously best-known sample bound for policy optimization algorithms, O(poly(d)/ε^8). Moreover, we empirically test our theory with deep neural nets to show the benefits of the theoretical inspiration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in re...
research
02/17/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient gives rise to a rich class of reinforcement learning (RL...
research
12/12/2019

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous succe...
research
03/22/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Policy optimization methods remain a powerful workhorse in empirical Rei...
research
06/14/2021

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...
research
12/12/2022

Variance-Reduced Conservative Policy Iteration

We study the sample complexity of reducing reinforcement learning to a s...
research
12/13/2021

A Benchmark for Low-Switching-Cost Reinforcement Learning

A ubiquitous requirement in many practical reinforcement learning (RL) a...

Please sign up or login with your details

Forgot password? Click here to reset