Policy Gradient Method For Robust Reinforcement Learning

05/15/2022
by   Shaofeng Zou, et al.
3

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method and show that to achieve an ϵ-global optimum, the complexity is 𝒪(ϵ^-3). We then extend our methodology to the general model-free setting and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide simulation results to demonstrate the robustness of our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

Policy gradient methods with actor-critic schemes demonstrate tremendous...
research
05/04/2021

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration a...
research
03/27/2016

Negative Learning Rates and P-Learning

We present a method of training a differentiable function approximator f...
research
05/05/2023

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Motivated by the recent success of Machine Learning tools in wireless co...
research
02/23/2021

Mixed Policy Gradient

Reinforcement learning (RL) has great potential in sequential decision-m...
research
05/27/2018

Contextual Policy Optimisation

Policy gradient methods have been successfully applied to a variety of r...

Please sign up or login with your details

Forgot password? Click here to reset