Optimistic Planning by Regularized Dynamic Programming

02/27/2023
by   Antoine Moulin, et al.
0

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments that are typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear kernel MDPs from a single stream of experience, and show that it achieves near-optimal statistical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2013

SPUDD: Stochastic Planning using Decision Diagrams

Markov decisions processes (MDPs) are becoming increasing popular as mod...
research
05/11/2010

Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

Approximate dynamic programming has been used successfully in a large va...
research
09/24/2021

A dynamic programming algorithm for informative measurements and near-optimal path-planning

An informative measurement is the most efficient way to gain information...
research
01/03/2023

Faster Approximate Dynamic Programming by Freezing Slow States

We consider infinite horizon Markov decision processes (MDPs) with fast-...
research
02/14/2012

Symbolic Dynamic Programming for Discrete and Continuous State MDPs

Many real-world decision-theoretic planning problems can be naturally mo...
research
05/08/2012

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

Approximate dynamic programming is a popular method for solving large Ma...
research
02/25/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undis...

Please sign up or login with your details

Forgot password? Click here to reset