On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

by   Sumana Basu, et al.
McGill University

Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.


page 1

page 2

page 3

page 4


Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

Reinforcement learning (RL) has been demonstrated to have great potentia...

Reinforcement Learning When All Actions are Not Always Available

The Markov decision process (MDP) formulation used to model many real-wo...

Optimization of anemia treatment in hemodialysis patients via reinforcement learning

Objective: Anemia is a frequent comorbidity in hemodialysis patients tha...

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

Reinforcement learning (RL) agents performing complex tasks must be able...

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

This article develops a deep reinforcement learning (Deep-RL) framework ...

Reinforcement learning and Bayesian data assimilation for model-informed precision dosing in oncology

Model-informed precision dosing (MIPD) using therapeutic drug/biomarker ...

Please sign up or login with your details

Forgot password? Click here to reset