Optimistic Value Iteration

10/02/2019
by   Arnd Hartmanns, et al.
0

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an extensive experimental evaluation using our publicly available implementation within the Modest Toolset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

We show subexponential lower bounds (i.e., 2^Ω (n^c)) on the smoothed co...
research
04/13/2018

Sound Value Iteration

Computing reachability probabilities is at the heart of probabilistic mo...
research
01/30/2023

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

The optimized certainty equivalent (OCE) is a family of risk measures th...
research
05/26/2023

Accelerating Value Iteration with Anchoring

Value Iteration (VI) is foundational to the theory and practice of moder...
research
04/18/2016

A Repeated Signal Difference for Recognising Patterns

This paper describes a new mechanism that might help with defining patte...
research
07/04/2012

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Existing complexity bounds for point-based POMDP value iteration algorit...

Please sign up or login with your details

Forgot password? Click here to reset