DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

03/13/2022
by   Jinxin Liu, et al.
0

Offline reinforcement learning algorithms promise to be applicable in settings where a fixed dataset is available and no new experience can be acquired. However, such formulation is inevitably offline-data-hungry and, in practice, collecting a large offline dataset for one specific task over one specific environment is also costly and laborious. In this paper, we thus 1) formulate the offline dynamics adaptation by using (source) offline data collected from another dynamics to relax the requirement for the extensive (target) offline data, 2) characterize the dynamics shift problem in which prior offline methods do not scale well, and 3) derive a simple dynamics-aware reward augmentation (DARA) framework from both model-free and model-based offline settings. Specifically, DARA emphasizes learning from those source transition pairs that are adaptive for the target environment and mitigates the offline dynamics shift by characterizing state-action-next-state pairs instead of the typical state-action distribution sketched by prior offline RL methods. The experimental evaluation demonstrates that DARA, by augmenting rewards in the source offline dataset, can acquire an adaptive policy for the target environment and yet significantly reduce the requirement of target offline data. With only modest amounts of target offline data, our performance consistently outperforms the prior offline RL methods in both simulated and real-world tasks.

READ FULL TEXT

page 22

page 23

page 26

research
06/27/2022

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Learning effective reinforcement learning (RL) policies to solve real-wo...
research
05/27/2020

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...
research
11/02/2021

Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Offline reinforcement learning leverages large datasets to train policie...
research
04/28/2021

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Standard dynamics models for continuous control make use of feedforward ...
research
09/30/2022

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Offline reinforcement learning (Offline RL) suffers from the innate dist...
research
06/06/2023

State Regularized Policy Optimization on Data with Dynamics Shift

In many real-world scenarios, Reinforcement Learning (RL) algorithms are...
research
11/21/2022

Data-Driven Offline Decision-Making via Invariant Representation Learning

The goal in offline data-driven decision-making is synthesize decisions ...

Please sign up or login with your details

Forgot password? Click here to reset