Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

09/16/2021
by   Sarah Rathnam, et al.
1

In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework – a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2017

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

In this paper, a sparse Markov decision process (MDP) with novel causal ...
research
06/20/2023

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Discount regularization, using a shorter planning horizon when calculati...
research
12/06/2021

Lecture Notes on Partially Known MDPs

In these notes we will tackle the problem of finding optimal policies fo...
research
11/01/2018

Temporal Regularization in Markov Decision Process

Several applications of Reinforcement Learning suffer from instability d...
research
11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...
research
12/04/2019

Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

Optimal policies in Markov decision processes (MDPs) are very sensitive ...

Please sign up or login with your details

Forgot password? Click here to reset