A2C is a special case of PPO

05/18/2022
by   Shengyi Huang, et al.
11

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using , showing A2C and PPO produce the exact same models when other settings are controlled.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset