On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

07/21/2020
by   Jun Liu, et al.
0

A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby helps further settle the open problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in stochastic approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

A common technique in reinforcement learning is to evaluate the value fu...
research
08/27/2018

On the convergence of optimistic policy iteration for stochastic shortest path problem

In this paper, we prove some convergence results of a special case of op...
research
02/10/2020

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

A simple and natural algorithm for reinforcement learning is Monte Carlo...
research
07/02/2022

Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

In this paper, we study a sequential decision making problem faced by e-...
research
10/19/2019

Opinion shaping in social networks using reinforcement learning

In this paper, we study how to shape opinions in social networks when th...
research
03/07/2022

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a ...
research
11/27/2015

On the convergence of cycle detection for navigational reinforcement learning

We consider a reinforcement learning framework where agents have to navi...

Please sign up or login with your details

Forgot password? Click here to reset