Continuous MDP Homomorphisms and Homomorphic Policy Gradient

by   Sahand Rezaei-Shoshtari, et al.

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.


page 23

page 25

page 28

page 30


Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Reinforcement learning on high-dimensional and complex problems relies o...

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...

A Simple Approach for State-Action Abstraction using a Learned MDP Homomorphism

Animals are able to rapidly infer from limited experience when sets of s...

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

We extend the idea underlying the success of green simulation assisted p...

Please sign up or login with your details

Forgot password? Click here to reset