Randomized Value Functions via Posterior State-Abstraction Sampling

10/05/2020
by   Dilip Arumugam, et al.
0

State abstraction has been an essential tool for dramatically improving the sample efficiency of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of latent structure within the environment, different classes of state abstraction have enabled improved theoretical guarantees and empirical performance. When dealing with state abstractions that capture structure in the value function, however, a standard assumption is that the true abstraction has been supplied or unrealistically computed a priori, leaving open the question of how to efficiently uncover such latent structure while jointly seeking out optimal behavior. Taking inspiration from the bandit literature, we propose that an agent seeking out latent task structure must explicitly represent and maintain its uncertainty over that structure as part of its overall uncertainty about the environment. We introduce a practical algorithm for doing this using two posterior distributions over state abstractions and abstract-state values. In empirically validating our approach, we find that substantial performance gains lie in the multi-task setting where tasks share a common, low-dimensional representation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2017

Near Optimal Behavior via Approximate State Abstraction

The combinatorial explosion that plagues planning and reinforcement lear...
research
02/19/2021

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Accuracy and generalization of dynamics models is key to the success of ...
research
06/07/2022

Discrete State-Action Abstraction via the Successor Representation

When reinforcement learning is applied with sparse rewards, agents must ...
research
03/07/2016

Learning Shared Representations in Multi-task Reinforcement Learning

We investigate a paradigm in multi-task reinforcement learning (MT-RL) i...
research
03/12/2020

Invariant Causal Prediction for Block MDPs

Generalization across environments is critical to the successful applica...
research
02/08/2020

Learning State Abstractions for Transfer in Continuous Control

Can simple algorithms with a good representation solve challenging reinf...
research
07/15/2021

MXDAG: A Hybrid Abstraction for Cluster Applications

Distributed applications, such as database queries and distributed train...

Please sign up or login with your details

Forgot password? Click here to reset