Proximal Policy Optimization with Mixed Distributed Training

07/15/2019
by   Zhenyu Zhang, et al.
2

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on proximal policy optimization (PPO), mixed distributed proximal policy optimization (MDPPO), and show that it can accelerate and stabilize the training process. In our algorithm, multiple different policies train simultaneously and each of them controls several identical agents that interact with environments. Actions are sampled by each policy separately as usual but the trajectories for training process are collected from all agents, instead of only one policy. We find that if we choose some auxiliary trajectories elaborately to train policies, the algorithm will be more stable and quicker to converge especially in the environments with sparse rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 7

page 8

page 9

research
03/22/2023

P^3O: Transferring Visual Representations for Reinforcement Learning via Prompting

It is important for deep reinforcement learning (DRL) algorithms to tran...
research
06/25/2019

Optimistic Proximal Policy Optimization

Reinforcement Learning, a machine learning framework for training an aut...
research
05/23/2022

Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization

At first sight it may seem straightforward to use recurrent layers in De...
research
01/26/2023

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient...
research
08/31/2023

Curriculum Proximal Policy Optimization with Stage-Decaying Clipping for Self-Driving at Unsignalized Intersections

Unsignalized intersections are typically considered as one of the most r...
research
02/20/2021

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorit...
research
10/18/2022

Proximal Learning With Opponent-Learning Awareness

Learning With Opponent-Learning Awareness (LOLA) (Foerster et al. [2018a...

Please sign up or login with your details

Forgot password? Click here to reset