Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

by   Andrew Wagenmaker, et al.

We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. Instance-optimality enjoys a rich asymptotic theory originating from the work of <cit.>, but non-asymptotic guarantees have remained elusive outside of certain special cases. Even for problems as simple as tabular reinforcement learning, existing algorithms do not attain instance-optimal performance until the number of rounds of interaction is doubly exponential in the number of states. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation. We introduce a new complexity measure, the Allocation-Estimation Coefficient (AEC), and provide a new algorithm, 𝖠𝖤^2, which attains non-asymptotic instance-optimal performance at a rate controlled by the AEC. Our results recover the best known guarantees for well-studied problems such as finite-armed and linear bandits and, when specialized to tabular reinforcement learning, attain the first instance-optimal regret bounds with polynomial dependence on all problem parameters, improving over prior work exponentially. We complement these results with lower bounds that show that i) existing notions of statistical complexity are insufficient to derive non-asymptotic guarantees, and ii) under certain technical conditions, boundedness of the AEC is necessary to learn an instance-optimal allocation of decisions in finite time.


page 1

page 2

page 3

page 4


Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorith...

A Note on Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

We consider the problem of interactive decision making, encompassing str...

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Stochastic linear bandits are a natural and simple generalisation of fin...

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

We study stochastic structured bandits for minimizing regret. The fact t...

On the Complexity of Adversarial Decision Making

A central problem in online learning and decision making – from bandits ...

Subset-Based Instance Optimality in Private Estimation

We propose a new definition of instance optimality for differentially pr...

Please sign up or login with your details

Forgot password? Click here to reset