Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

09/29/2022
by   Qinghua Liu, et al.
5

This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples. This rich class includes not only a majority of known tractable model-based Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low witness rank problems, tabular weakly-revealing/observable POMDPs and multi-step decodable POMDPs), but also many new challenging RL problems especially in the partially observable setting that were not previously known to be tractable. Notably, the new problems addressed by this paper include (1) observable POMDPs with continuous observation and function approximation, where we achieve the first sample complexity that is completely independent of the size of observation space; (2) well-conditioned low-rank sequential decision making problems (also known as Predictive State Representations (PSRs)), which include and generalize all known tractable POMDP examples under a more intrinsic representation; (3) general sequential decision making problems under SAIL condition, which unifies our existing understandings of model-based RL in both fully observable and partially observable settings. SAIL condition is identified by this paper, which can be viewed as a natural generalization of Bellman/witness rank to address partial observability.

READ FULL TEXT
research
09/29/2022

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

Partial Observability – where agents can only observe partial informatio...
research
02/01/2021

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

Finding the minimal structural assumptions that empower sample-efficient...
research
07/01/2023

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

The general sequential decision-making problem, which includes Markov de...
research
06/21/2023

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

In this paper, we study representation learning in partially observable ...
research
08/21/2021

Sequential Stochastic Optimization in Separable Learning Environments

We consider a class of sequential decision-making problems under uncerta...
research
12/10/2021

Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning

This paper proposes a new sequential model learning architecture to solv...
research
03/13/2013

Some Problems for Convex Bayesians

We discuss problems for convex Bayesian decision making and uncertainty ...

Please sign up or login with your details

Forgot password? Click here to reset