Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

05/27/2022
by   Johannes Müller, et al.
0

Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies. Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective subject to polynomial constraints. Based on this we present an approach for Reward Optimization in State-Action space (ROSA). We test this approach experimentally in maze navigation tasks. We find that ROSA is computationally efficient and can yield stability improvements over other existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2015

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that for any finite state Markov decision process (MDP)...
research
10/14/2021

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

We consider the problem of finding the best memoryless stochastic policy...
research
09/22/2019

Faster saddle-point optimization for solving large-scale Markov decision processes

We consider the problem of computing optimal policies in average-reward ...
research
04/29/2011

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance m...
research
10/15/2020

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

In the theory of Partially Observed Markov Decision Processes (POMDPs), ...
research
09/24/2020

Robust Finite-State Controllers for Uncertain POMDPs

Uncertain partially observable Markov decision processes (uPOMDPs) allow...
research
07/09/2019

Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics

We propose a neural network architecture, called TransNet, that combines...

Please sign up or login with your details

Forgot password? Click here to reset