A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence

01/30/2023
by   Carlo Alfano, et al.
0

Modern policy optimization methods in applied reinforcement learning, such as Trust Region Policy Optimization and Policy Mirror Descent, are often based on the policy gradient framework. While theoretical guarantees have been established for this class of algorithms, particularly in the tabular setting, the use of a general parametrization scheme remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parametrizations. The policy class induced by our scheme recovers known classes, e.g. softmax, and it generates new ones, depending on the choice of the mirror map. For a general mirror map and parametrization class, we establish the quasi-monotonicity of the updates in value function, global linear convergence rates, and we bound the total expected Bregman divergence of the algorithm along its path. To showcase the ability of our framework to accommodate general parametrization schemes, we present a case study involving shallow neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2022

On the Convergence Rates of Policy Gradient Methods

We consider infinite-horizon discounted Markov decision problems with fi...
research
05/17/2022

On the Convergence of Policy in Unregularized Policy Mirror Descent

In this short note, we give the convergence analysis of the policy in th...
research
09/30/2022

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

We analyze the convergence rate of the unregularized natural policy grad...
research
05/24/2021

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which learns the policy of interest by maximizing t...
research
05/24/2019

Neural Temporal-Difference Learning Converges to Global Optima

Temporal-difference learning (TD), coupled with neural networks, is amon...
research
10/30/2021

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Policy gradient methods have been frequently applied to problems in cont...
research
01/26/2022

Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

We introduce a novel framework for optimization based on energy-conservi...

Please sign up or login with your details

Forgot password? Click here to reset