Path Consistency Learning in Tsallis Entropy Regularized MDPs

by   Ofir Nachum, et al.

We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.


page 1

page 2

page 3

page 4


Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Maximum entropy deep reinforcement learning (RL) methods have been demon...

Sparsity Prior Regularized Q-learning for Sparse Action Tasks

In many decision-making tasks, some specific actions are limited in thei...

A Unified Framework for Regularized Reinforcement Learning

We propose and study a general framework for regularized Markov decision...

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

In this paper, a sparse Markov decision process (MDP) with novel causal ...

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Algorithms based on the entropy regularized framework, such as Soft Q-le...

Parameterized MDPs and Reinforcement Learning Problems – A Maximum Entropy Principle Based Framework

We present a framework to address a class of sequential decision making ...

A short variational proof of equivalence between policy gradients and soft Q learning

Two main families of reinforcement learning algorithms, Q-learning and p...

Please sign up or login with your details

Forgot password? Click here to reset