Reinforcement Learning in Low-Rank MDPs with Density Features

by   Audrey Huang, et al.

MDPs with low-rank transitions – that is, the transition matrix can be factored into the product of two matrices, left and right – is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blow-up, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.


page 1

page 2

page 3

page 4


Representation Learning for Online and Offline RL in Low-rank MDPs

This work studies the question of Representation Learning in RL: how can...

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

In order to deal with the curse of dimensionality in reinforcement learn...

Efficient Model-Free Exploration in Low-Rank MDPs

A major challenge in reinforcement learning is to develop practical, sam...

Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Reinforcement learning (RL) under changing environment models many real-...

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

We study reward-free reinforcement learning (RL) under general non-linea...

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

We consider offline Reinforcement Learning (RL), where the agent does no...

Please sign up or login with your details

Forgot password? Click here to reset