Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

08/19/2022
by   Rong-Jun Qin, et al.
0

This paper addresses policy learning in non-stationary environments and games with continuous actions. Rather than the classical reward maximization mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and mirror descent (MD) update, we propose a no-regret style reinforcement learning algorithm PORL for continuous action tasks. We prove that PORL has a last-iterate convergence guarantee, which is important for adversarial and cooperative games. Empirical studies show that, in stationary environments such as MuJoCo locomotion controlling tasks, PORL performs equally well as, if not better than, the soft actor-critic (SAC) algorithm; in non-stationary environments including dynamical environments, adversarial training, and competitive games, PORL is superior to SAC in both a better final policy performance and a more stable training process.

READ FULL TEXT

page 6

page 17

research
05/07/2021

Context-Based Soft Actor Critic for Environments with Non-stationary Dynamics

The performance of deep reinforcement learning methods prone to degenera...
research
04/17/2017

Pseudorehearsal in actor-critic agents

Catastrophic forgetting has a serious impact in reinforcement learning, ...
research
11/06/2022

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperati...
research
08/18/2023

A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

Multi-objective Markov decision processes are a special kind of multi-ob...
research
01/17/2017

Intrinsically Motivated Acquisition of Modular Slow Features for Humanoids in Continuous and Non-Stationary Environments

A compact information-rich representation of the environment, also calle...
research
04/25/2018

Multiagent Soft Q-Learning

Policy gradient methods are often applied to reinforcement learning in c...
research
07/13/2023

The complexity of non-stationary reinforcement learning

The problem of continual learning in the domain of reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset