Classical Policy Gradient: Preserving Bellman's Principle of Optimality

06/06/2019
by   Philip S. Thomas, et al.
0

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

READ FULL TEXT

page 1

page 2

page 3

research
05/22/2022

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

We study convex Constrained Markov Decision Processes (CMDPs) in which t...
research
10/22/2020

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

We study the problem of policy optimization for infinite-horizon discoun...
research
10/18/2018

Trust Region Policy Optimization of POMDPs

We propose Generalized Trust Region Policy Optimization (GTRPO), a Reinf...
research
05/30/2023

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose a policy gradient algorithm for robust infinite-horizon Marko...
research
03/04/2021

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

In order to model risk aversion in reinforcement learning, an emerging l...
research
10/18/2021

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Improving the resilience of a network protects the system from natural d...
research
05/26/2023

A Policy Gradient Method for Confounded POMDPs

In this paper, we propose a policy gradient method for confounded partia...

Please sign up or login with your details

Forgot password? Click here to reset