Efficient Model-Free Exploration in Low-Rank MDPs

by   Zakaria Mhammedi, et al.

A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes – where transition probabilities admit a low-rank factorization based on an unknown feature embedding – offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a generalized optimal design for the feature embedding as an efficiently computable basis for exploration, performing efficient optimal design computation by interleaving representation learning and policy optimization. Our analysis – which is appealingly simple and modular – carefully combines several techniques, including a new reduction from optimal design computation to policy optimization based on the Frank-Wolfe method, and an improved analysis of a certain minimax representation learning objective found in prior work.


page 1

page 2

page 3

page 4


Model-free Representation Learning and Exploration in Low-rank MDPs

The low rank MDP has emerged as an important model for studying represen...

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

The success of deep reinforcement learning (DRL) is due to the power of ...

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

We study the design of sample-efficient algorithms for reinforcement lea...

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

In this paper, we study representation learning in partially observable ...

Making Linear MDPs Practical via Contrastive Representation Learning

It is common to address the curse of dimensionality in Markov decision p...

Reinforcement Learning in Low-Rank MDPs with Density Features

MDPs with low-rank transitions – that is, the transition matrix can be f...

Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Reinforcement learning (RL) under changing environment models many real-...

Please sign up or login with your details

Forgot password? Click here to reset