IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

04/20/2023
by   Philippe Hansen-Estruch, et al.
0

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importance sampled our intended policy. We introduce Implicit Diffusion Q-learning (IDQL), combining our general IQL critic with the policy extraction method. IDQL maintains the ease of implementation of IQL while outperforming prior offline RL methods and demonstrating robustness to hyperparameters. Code is available at https://github.com/philippe-eecs/IDQL.

READ FULL TEXT
research
01/30/2023

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a...
research
02/05/2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free a...
research
05/17/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...
research
06/28/2023

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is...
research
06/20/2023

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained...
research
10/21/2021

Actor-critic is implicitly biased towards high entropy optimal policies

We show that the simplest actor-critic method – a linear softmax policy ...
research
02/11/2023

UGAE: A Novel Approach to Non-exponential Discounting

The discounting mechanism in Reinforcement Learning determines the relat...

Please sign up or login with your details

Forgot password? Click here to reset