Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

10/13/2022
by   Anna Winnicki, et al.
0

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

On the Convergence of Policy Gradient in Robust MDPs

Robust Markov decision processes (RMDPs) are promising models that provi...
research
12/27/2017

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for pol...
research
10/18/2021

Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

In this paper, an algorithm for approximate evaluation of back-propagati...
research
02/08/2020

Provably Efficient Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value functio...
research
04/15/2013

Off-policy Learning with Eligibility Traces: A Survey

In the framework of Markov Decision Processes, off-policy learning, that...
research
01/29/2021

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

We consider Markov Decision Processes (MDPs) in which every stationary p...
research
10/16/2012

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on ...

Please sign up or login with your details

Forgot password? Click here to reset