PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

10/05/2018
by   Perttu Hämäläinen, et al.
0

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy -- common in computer animation and robotics -- PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning

We consider continuous-time Mean-variance (MV) portfolio optimization pr...
research
04/25/2019

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

We approach the continuous-time mean-variance (MV) portfolio selection w...
research
12/13/2022

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Proximal Policy Optimization (PPO) is a highly popular policy-based deep...
research
05/30/2023

Policy Optimization for Continuous Reinforcement Learning

We study reinforcement learning (RL) in the setting of continuous time a...
research
03/06/2018

Smoothed Action Value Functions for Learning Gaussian Policies

State-action value functions (i.e., Q-values) are ubiquitous in reinforc...
research
07/26/2019

Large scale continuous-time mean-variance portfolio allocation via reinforcement learning

We propose to solve large scale Markowitz mean-variance (MV) portfolio a...
research
02/12/2018

Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control

In this work we introduce the application of black-box quantum control a...

Please sign up or login with your details

Forgot password? Click here to reset