Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

by   Navdeep Kumar, et al.

Robust Markov decision processes (MDPs) provide a general framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between rectangular L_p robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. However, there lacks a clear understanding of the policy improvement step. For example, we know the greedy policy can be stochastic but have little clue how each action affects this greedy policy. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators. We find that the greedy policy is closely related to some combination of the top k actions, which provides a novel characterization of its stochasticity. The exact nature of the combination depends on the shape of the uncertainty set. Furthermore, our results allow us to efficiently compute the policy improvement step by a simple binary search, without turning to an external optimization subroutine. Moreover, for L_1, L_2, and L_∞ robust MDPs, we can even get rid of the binary search and evaluate the optimal robust Bellman operators exactly. Our work greatly extends existing results on solving -rectangular L_p robust MDPs via regularized policy iteration and can be readily adapted to sample-based model-free algorithms.


page 1

page 2

page 3

page 4


Twice regularized MDPs and the equivalence between robustness and regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

We consider Markov Decision Processes (MDPs) in which every stationary p...

Metareasoning for Planning Under Uncertainty

The conventional model for online planning under uncertainty assumes tha...

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...

On the convex formulations of robust Markov decision processes

Robust Markov decision processes (MDPs) are used for applications of dyn...

Avoiding Model Estimation in Robust Markov Decision Processes with a Generative Model

Robust Markov Decision Processes (MDPs) are getting more attention for l...

A Gauss-Newton Method for Markov Decision Processes

Approximate Newton methods are a standard optimization tool which aim to...

Please sign up or login with your details

Forgot password? Click here to reset