Policy Optimization for Continuous Reinforcement Learning

05/30/2023
by   Hanyang Zhao, et al.
0

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuous approach to RL, we develop a notion of occupation time (specifically for a discounted objective), and show how it can be effectively used to derive performance-difference and local-approximation formulas. We further extend these results to illustrate their applications in the PG (policy gradient) and TRPO/PPO (trust region policy optimization/ proximal policy optimization) methods, which have been familiar and powerful tools in the discrete RL setting but under-developed in continuous RL. Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

Temporal-Differential Learning in Continuous Environments

In this paper, a new reinforcement learning (RL) method known as the met...
research
06/14/2020

Optimistic Distributionally Robust Policy Optimization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
05/09/2017

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Policy iteration (PI) is a recursive process of policy evaluation and im...
research
05/09/2023

Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

The nuclear fuel loading pattern optimization problem has been studied s...
research
10/05/2018

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Proximal Policy Optimization (PPO) is a highly popular model-free reinfo...
research
08/14/2023

Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning

Prompt-based pre-trained language models (PLMs) paradigm have succeeded ...
research
06/04/2011

Optimal Reinforcement Learning for Gaussian Systems

The exploration-exploitation trade-off is among the central challenges o...

Please sign up or login with your details

Forgot password? Click here to reset