A Note on Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

11/25/2022
by   Dylan J. Foster, et al.
9

We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upper bounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which lifts algorithms for (supervised) online estimation into algorithms for decision making. In this note, we show that by combining Estimation-to-Decisions with a specialized form of optimistic estimation introduced by Zhang (2022), it is possible to obtain guarantees that improve upon those of Foster et al. (2021) by accommodating more lenient notions of estimation error. We use this approach to derive regret bounds for model-free reinforcement learning with value function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2023

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

A foundational problem in reinforcement learning and interactive decisio...
research
12/27/2021

The Statistical Complexity of Interactive Decision Making

A fundamental challenge in interactive learning and decision making, ran...
research
05/12/2019

Note on Thompson sampling for large decision problems

There is increasing interest in using streaming data to inform decision ...
research
12/26/2018

Secure Estimation under Causative Attacks

This paper considers the problem of secure parameter estimation when the...
research
10/27/2022

Regret Bounds and Experimental Design for Estimate-then-Optimize

In practical applications, data is used to make decisions in two steps: ...
research
04/24/2023

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms f...
research
09/23/2022

Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

Finding unified complexity measures and algorithms for sample-efficient ...

Please sign up or login with your details

Forgot password? Click here to reset