Statistical Inference After Adaptive Sampling in Non-Markovian Environments

02/14/2022
by   Kelly W. Zhang, et al.
3

There is a great desire to use adaptive sampling methods, such as reinforcement learning (RL) and bandit algorithms, for the real-time personalization of interventions in digital applications like mobile health and education. A major obstacle preventing more widespread use of such algorithms in practice is the lack of assurance that the resulting adaptively collected data can be used to reliably answer inferential questions, including questions about time-varying causal effects. Current methods for statistical inference on such data are insufficient because they (a) make strong assumptions regarding the environment dynamics, e.g., assume a contextual bandit or Markovian environment, or (b) require data to be collected with one adaptive sampling algorithm per user, which excludes data collected by algorithms that learn to select actions by pooling the data of multiple users. In this work, we make initial progress by introducing the adaptive sandwich estimator to quantify uncertainty; this estimator (a) is valid even when user rewards and contexts are non-stationary and highly dependent over time, and (b) accommodates settings in which an online adaptive sampling algorithm learns using the data of all users. Furthermore, our inference method is robust to misspecification of the reward models used by the adaptive sampling algorithm. This work is motivated by our work designing experiments in which RL algorithms are used to select actions, yet reliable statistical inference is essential for conducting primary analyses after the trial is over.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2021

Statistical Inference with M-Estimators on Bandit Data

Bandit algorithms are increasingly used in real world sequential decisio...
research
10/19/2022

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are ubiquitous tools for active sequential ...
research
02/25/2021

Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits

To balance exploration and exploitation, multi-armed bandit algorithms n...
research
08/08/2021

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

The recent emergence of reinforcement learning has created a demand for ...
research
03/21/2023

Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches

Standard bandit algorithms that assume continual reallocation of measure...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...

Please sign up or login with your details

Forgot password? Click here to reset