Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

by   Yunzhe Zhou, et al.

In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions would produce sub-optimal policies. The pessimism principle addresses this issue by discouraging recommendation of actions that are less explored conditioning on the state. However, nearly all pessimism-based methods rely on a key hyper-parameter that quantifies the degree of pessimism, and the performance of the methods can be highly sensitive to the choice of this parameter. We propose to integrate the pessimism principle with Thompson sampling and Bayesian machine learning for optimizing the degree of pessimism. We derive a credible set whose boundary uniformly lower bounds the optimal Q-function, and thus does not require additional tuning of the degree of pessimism. We develop a general Bayesian learning method that works with a range of models, from Bayesian linear basis model to Bayesian neural network model. We develop the computational algorithm based on variational inference, which is highly efficient and scalable. We establish the theoretical guarantees of the proposed method, and show empirically that it outperforms the existing state-of-the-art solutions through both simulations and a real data example.


page 1

page 2

page 3

page 4


Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes

To achieve the goal of providing the best possible care to each patient,...

PASTA: Pessimistic Assortment Optimization

We consider a class of assortment optimization problems in an offline da...

Exploitation vs Caution: Risk-sensitive Policies for Offline Learning

Offline model learning for planning is a branch of machine learning that...

Online Parameter-Free Learning of Multiple Low Variance Tasks

We propose a method to learn a common bias vector for a growing sequence...

Variational Inference: Posterior Threshold Improves Network Clustering Accuracy in Sparse Regimes

Variational inference has been widely used in machine learning literatur...

Variational Latent Branching Model for Off-Policy Evaluation

Model-based methods have recently shown great potential for off-policy e...

A Bayesian Variational principle for dynamic Self Organizing Maps

We propose organisation conditions that yield a method for training SOM ...

Please sign up or login with your details

Forgot password? Click here to reset