Continuous MDP Homomorphisms and Homomorphic Policy Gradient

09/15/2022
by   Sahand Rezaei-Shoshtari, et al.
5

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.

READ FULL TEXT

page 23

page 25

page 28

page 30

05/09/2023

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Reinforcement learning on high-dimensional and complex problems relies o...
11/22/2018

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...
09/14/2022

A Simple Approach for State-Action Abstraction using a Learned MDP Homomorphism

Animals are able to rapidly infer from limited experience when sets of s...
11/16/2021

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...
05/06/2022

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

We extend the idea underlying the success of green simulation assisted p...

Please sign up or login with your details

Forgot password? Click here to reset