Policy Optimization over General State and Action Spaces

by   Guanghui Lan, et al.

Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.


page 1

page 2

page 3

page 4


Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

We present new policy mirror descent (PMD) methods for solving reinforce...

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

We consider the reinforcement learning (RL) problem with general utiliti...

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Natural policy gradient (NPG) methods with function approximation achiev...

Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning

Deep reinforcement learning (DRL) breaks through the bottlenecks of trad...

Block Policy Mirror Descent

In this paper, we present a new class of policy gradient (PG) methods, n...

Discrete Sequential Prediction of Continuous Actions for Deep RL

It has long been assumed that high dimensional continuous control proble...

Diverse Policy Optimization for Structured Action Space

Enhancing the diversity of policies is beneficial for robustness, explor...

Please sign up or login with your details

Forgot password? Click here to reset