DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

10/18/2020
by   Aayam Shrestha, et al.
0

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with image-based observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex offline RL problems.

READ FULL TEXT

page 5

page 15

page 16

page 17

page 21

research
03/17/2022

Semi-Markov Offline Reinforcement Learning for Healthcare

Reinforcement learning (RL) tasks are typically framed as Markov Decisio...
research
11/29/2021

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Reinforcement learning (RL) agents are widely used for solving complex s...
research
09/09/2019

Solving Continual Combinatorial Selection via Deep Reinforcement Learning

We consider the Markov Decision Process (MDP) of selecting a subset of i...
research
01/07/2022

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a ...
research
09/30/2022

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Offline reinforcement learning (Offline RL) suffers from the innate dist...
research
11/14/2022

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Modern decision-making systems, from robots to web recommendation engine...
research
10/16/2012

A Theory of Goal-Oriented MDPs with Dead Ends

Stochastic Shortest Path (SSP) MDPs is a problem class widely studied in...

Please sign up or login with your details

Forgot password? Click here to reset