Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

09/13/2020
by   Aviv Rosenberg, et al.
0

We consider provably-efficient reinforcement learning (RL) in non-episodic factored Markov decision processes (FMDPs). All previous algorithms for regret minimization in this setting made the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first provably-efficient algorithm that has to learn the structure of the FMDP while minimizing its regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. It maintains its computational efficiency even though the number of possible structures is exponential.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Reinforcement learning (RL) in episodic, factored Markov decision proces...
research
02/06/2020

Near-optimal Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms for the Non-episodic Setting

We study reinforcement learning in factored Markov decision processes (F...
research
06/24/2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

We study minimax optimal reinforcement learning in episodic factored Mar...
research
07/19/2021

Provably Efficient Multi-Task Reinforcement Learning with Model Transfer

We study multi-task reinforcement learning (RL) in tabular episodic Mark...
research
01/06/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

We study reinforcement learning (RL) with linear function approximation ...
research
06/21/2020

On Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throug...
research
12/28/2021

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

We consider large-scale Markov decision processes with an unknown cost f...

Please sign up or login with your details

Forgot password? Click here to reset