PAC Reinforcement Learning with Rich Observations

02/08/2016
by   Akshay Krishnamurthy, et al.
University of Massachusetts Amherst
Microsoft
0

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/29/2016

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

This paper studies systematic exploration for reinforcement learning wit...
03/15/2020

Provably Efficient Exploration for RL with Unsupervised Learning

We study how to use unsupervised learning for efficient exploration in r...
03/01/2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PA...
11/20/2019

Corruption Robust Exploration in Episodic Reinforcement Learning

We initiate the study of multi-stage episodic reinforcement learning und...
02/07/2020

Causally Correct Partial Models for Reinforcement Learning

In reinforcement learning, we can learn a model of future observations a...
02/28/2017

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Conventional reinforcement learning methods for Markov decision processe...
03/07/2022

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

The main challenge of multiagent reinforcement learning is the difficult...

Code Repositories

NIPS2016

This project collects the different accepted papers and their link to Arxiv or Gitxiv


view repo

Please sign up or login with your details

Forgot password? Click here to reset