Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

03/24/2015
by   Guido Montufar, et al.
0

It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic. We study the expected reward optimization problem over the set of memoryless stochastic policies. We formulate this as a constrained linear optimization problem and develop a corresponding geometric framework. We show that any POMDP has an optimal memoryless policy of limited stochasticity, which allows us to reduce the dimensionality of the search space. Experiments demonstrate that this approach enables better and faster convergence of the policy gradient on the evaluated systems.

READ FULL TEXT
research
06/01/2011

Nonapproximability Results for Partially Observable Markov Decision Processes

We show that for several variations of partially observable Markov decis...
research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
01/23/2013

Solving POMDPs by Searching the Space of Finite Policies

Solving partially observable Markov decision processes (POMDPs) is highl...
research
06/02/2022

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

Policy gradient (PG) is a reinforcement learning (RL) approach that opti...
research
12/31/2020

Robust Asymmetric Learning in POMDPs

Policies for partially observed Markov decision processes can be efficie...
research
02/26/2019

Information Gathering in Decentralized POMDPs by Policy Graph Improvement

Decentralized policies for information gathering are required when multi...
research
07/04/2012

Existence and Finiteness Conditions for Risk-Sensitive Planning: Results and Conjectures

Decision-theoretic planning with risk-sensitive planning objectives is i...

Please sign up or login with your details

Forgot password? Click here to reset