Sequential Stochastic Optimization in Separable Learning Environments

08/21/2021
by   R. Reid Bishop, et al.
0

We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts. These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process, the observation process only observes the modulation process, and the modulation process is exogenous to control. We model this broad class of problems as a partially observed Markov decision process (POMDP). The belief function for the modulation process is control invariant, thus separating the estimation of the modulation process from the control of the state process. We call this specially structured POMDP the separable POMDP, or SEP-POMDP, and show it (i) can serve as a model for a broad class of application areas, e.g., inventory control, finance, healthcare systems, (ii) inherits value function and optimal policy structure from a set of completely observed MDPs, (iii) can serve as a bridge between classical models of sequential decision making under uncertainty having fully specified model artifacts and such models that are not fully specified and require the use of predictive methods from statistics and machine learning, and (iv) allows for specialized approximate solution procedures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2020

Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

The Markov assumption (MA) is fundamental to the empirical validity of r...
research
09/29/2022

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

This paper introduces a simple efficient learning algorithms for general...
research
09/17/2018

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

The objective is to study an on-line Hidden Markov model (HMM) estimatio...
research
04/04/2021

Active Trajectory Estimation for Partially Observed Markov Decision Processes via Conditional Entropy

In this paper, we consider the problem of controlling a partially observ...
research
10/15/2018

Successor Uncertainties: exploration and uncertainty in temporal difference learning

We consider the problem of balancing exploration and exploitation in seq...
research
02/15/2019

Bi-directional Value Learning for Risk-aware Planning Under Uncertainty

Decision-making under uncertainty is a crucial ability for autonomous sy...
research
12/17/2021

Sequential decision making for a class of hidden Markov processes, application to medical treatment optimisation

Motivated by a medical decision making problem, this paper focuses on an...

Please sign up or login with your details

Forgot password? Click here to reset