Delegative Reinforcement Learning: learning to avoid traps with a little help

by   Vanessa Kosoy, et al.
Machine Intelligence Research Institute

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.


page 1

page 2

page 3

page 4


Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...

The Impatient May Use Limited Optimism to Minimize Regret

Discounted-sum games provide a formal model for the study of reinforceme...

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

We study the regret guarantee for risk-sensitive reinforcement learning ...

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

We give a simple optimistic algorithm for which it is easy to derive reg...

Delayed Feedback in Episodic Reinforcement Learning

There are many provably efficient algorithms for episodic reinforcement ...

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...

Please sign up or login with your details

Forgot password? Click here to reset