Non-asymptotic Performances of Robust Markov Decision Processes

05/09/2021
โˆ™
by   Wenhao Yang, et al.
โˆ™
4
โˆ™

In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the L_1, ฯ‡^2 and KL balls in both (s,a)-rectangular and s-rectangular assumptions. Our results show that when we assume (s,a)-rectangular on uncertainty sets, the sample complexity is about O(|๐’ฎ|^2|๐’œ|/ฮต^2ฯ^2(1-ฮณ)^4) in the generative model setting and O(|๐’ฎ|/ฮฝ_minฮต^2ฯ^2(1-ฮณ)^4) in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and (s,a)-rectangular assumption, we also extend our results to a more general s-rectangular assumption, which leads to a larger sample complexity than the (s,a)-rectangular assumption.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 02/10/2023

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

We study the sample complexity of obtaining an ฯต-optimal policy in Robus...
research
โˆ™ 12/02/2021

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing...
research
โˆ™ 02/26/2023

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

We consider a reinforcement learning setting in which the deployment env...
research
โˆ™ 05/22/2023

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Offline reinforcement learning aims to find the optimal policy from a pr...
research
โˆ™ 01/31/2018

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path

In this paper, we consider a modified version of the control problem in ...
research
โˆ™ 04/09/2019

Practical Open-Loop Optimistic Planning

We consider the problem of online planning in a Markov Decision Process ...
research
โˆ™ 03/24/2022

Kullback-Leibler control for discrete-time nonlinear systems on continuous spaces

Kullback-Leibler (KL) control enables efficient numerical methods for no...

Please sign up or login with your details

Forgot password? Click here to reset