An Alternative Softmax Operator for Reinforcement Learning

12/16/2016
by   Kavosh Asadi, et al.
0

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study a differentiable softmax operator that, among other properties, is a non-expansion ensuring a convergent behavior in learning and planning. We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2019

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Value function estimation is an important task in reinforcement learning...
research
12/02/2018

Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits

The softmax function has been primarily employed in reinforcement learni...
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
05/08/2018

Online normalizer calculation for softmax

The Softmax function is ubiquitous in machine learning, multiple previou...
research
11/21/2019

Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control

Deep networks have enabled reinforcement learning to scale to more compl...
research
06/09/2020

A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers

Neural Network (NN) classifiers can assign extreme probabilities to samp...
research
02/23/2018

Learning Latent Permutations with Gumbel-Sinkhorn Networks

Permutations and matchings are core building blocks in a variety of late...

Please sign up or login with your details

Forgot password? Click here to reset