Offline Reinforcement Learning with Reverse Model-based Imagination

10/01/2021
by   Jianhao Wang, et al.
0

In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset. To address this problem, recent offline RL methods attempt to introduce conservatism bias to encourage learning on high-confidence areas. Model-free approaches directly encode such bias into policy or value function learning using conservative regularizations or special network structures, but their constrained policy search limits the generalization beyond the offline dataset. Model-based approaches learn forward dynamics models with conservatism quantifications and then generate imaginary trajectories to extend the offline datasets. However, due to limited samples in offline dataset, conservatism quantifications often suffer from overgeneralization in out-of-support regions. The unreliable conservative measures will mislead forward model-based imaginations to undesired areas, leading to overaggressive behaviors. To encourage more conservatism, we propose a novel model-based offline RL framework, called Reverse Offline Model-based Imagination (ROMI). We learn a reverse dynamics model in conjunction with a novel reverse policy, which can generate rollouts leading to the target goal states within the offline dataset. These reverse imaginations provide informed data augmentation for the model-free policy learning and enable conservative generalization beyond the offline dataset. ROMI can effectively combine with off-the-shelf model-free algorithms to enable model-based generalization with proper conservatism. Empirical results show that our method can generate more conservative behaviors and achieve state-of-the-art performance on offline RL benchmark tasks.

READ FULL TEXT

page 2

page 6

page 9

research
06/16/2022

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

The learned policy of model-free offline reinforcement learning (RL) met...
research
08/12/2020

Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

State-of-the-art reinforcement learning algorithms mostly rely on being ...
research
08/07/2023

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Offline reinforcement learning (RL) methods strike a balance between exp...
research
01/03/2023

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline dat...
research
06/07/2023

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real...
research
07/05/2022

Offline RL Policies Should be Trained to be Adaptive

Offline RL algorithms must account for the fact that the dataset they ar...
research
09/07/2022

Concept-modulated model-based offline reinforcement learning for rapid generalization

The robustness of any machine learning solution is fundamentally bound b...

Please sign up or login with your details

Forgot password? Click here to reset