Policy Gradient Methods in the Presence of Symmetries and State Abstractions

by   Prakash Panangaden, et al.

Reinforcement learning on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of MDP homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the presence of such symmetries. We demonstrate the effectiveness of our method on our environments, as well as on challenging visual control tasks from the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance, and the visualizations of the latent space clearly demonstrate the structure of the learned abstraction.


page 25

page 26

page 27


Continuous MDP Homomorphisms and Homomorphic Policy Gradient

Abstraction has been widely studied as a way to improve the efficiency a...

A Simple Approach for State-Action Abstraction using a Learned MDP Homomorphism

Animals are able to rapidly infer from limited experience when sets of s...

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Lying on the heart of intelligent decision-making systems, how policy is...

Improving the Universality and Learnability of Neural Programmer-Interpreters with Combinator Abstraction

To overcome the limitations of Neural Programmer-Interpreters (NPI) in i...

Run, skeleton, run: skeletal model in a physics-based simulation

In this paper, we present our approach to solve a physics-based reinforc...

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic polic...

All-Action Policy Gradient Methods: A Numerical Integration Approach

While often stated as an instance of the likelihood ratio trick [Rubinst...

Please sign up or login with your details

Forgot password? Click here to reset