The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

by   Sarah Rathnam, et al.

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.


page 1

page 2

page 3

page 4


Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

In batch reinforcement learning, there can be poorly explored state-acti...

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

This work considers the sample complexity of obtaining an ϵ-optimal poli...

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a ...

Sparsity Prior Regularized Q-learning for Sparse Action Tasks

In many decision-making tasks, some specific actions are limited in thei...

The Limits of Learning and Planning: Minimal Sufficient Information Transition Systems

In this paper, we view a policy or plan as a transition system over a sp...

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitab...

Efficient Local Planning with Linear Function Approximation

We study query and computationally efficient planning algorithms with li...

Please sign up or login with your details

Forgot password? Click here to reset