A Max-Min Entropy Framework for Reinforcement Learning

06/19/2021
by   Seungyul Han, et al.
0

In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the maximum entropy RL framework in model-free sample-based learning. Whereas the maximum entropy RL framework guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.

READ FULL TEXT

page 4

page 8

research
01/31/2019

Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

In this paper, we present a new class of Markov decision processes (MDPs...
research
06/20/2023

Reward Shaping via Diffusion Process in Reinforcement Learning

Reinforcement Learning (RL) models have continually evolved to navigate ...
research
11/03/2019

Maximum Entropy Diverse Exploration: Disentangling Maximum Entropy Reinforcement Learning

Two hitherto disconnected threads of research, diverse exploration (DE) ...
research
02/15/2023

Optimal Sample Complexity of Reinforcement Learning for Uniformly Ergodic Discounted Markov Decision Processes

We consider the optimal sample complexity theory of tabular reinforcemen...
research
09/05/2020

A Hybrid PAC Reinforcement Learning Algorithm

This paper offers a new hybrid probably asymptotically correct (PAC) rei...
research
02/22/2021

Action Redundancy in Reinforcement Learning

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning p...
research
07/14/2020

Single-partition adaptive Q-learning

This paper introduces single-partition adaptive Q-learning (SPAQL), an a...

Please sign up or login with your details

Forgot password? Click here to reset