Concentration bounds for SSP Q-learning for average cost MDPs

06/07/2022
by   Shaan ul Haque, et al.
4

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Square-root regret bounds for continuous-time episodic Markov decision processes

We study reinforcement learning for continuous-time Markov decision proc...
research
01/30/2023

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

The optimized certainty equivalent (OCE) is a family of risk measures th...
research
07/06/2017

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic s...
research
03/03/2022

Risk-aware Stochastic Shortest Path

We treat the problem of risk-aware control for stochastic shortest path ...
research
04/24/2018

Computational Approaches for Stochastic Shortest Path on Succinct MDPs

We consider the stochastic shortest path (SSP) problem for succinct Mark...
research
02/08/2020

Provably Efficient Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value functio...
research
02/12/2019

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

The paper addresses two variants of the stochastic shortest path problem...

Please sign up or login with your details

Forgot password? Click here to reset