Reinforcement Learning with Non-Exponential Discounting

09/27/2022
by   Matthias Schultheis, et al.
0

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

READ FULL TEXT

page 7

page 8

research
02/08/2019

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Reinforcement learning (RL) agents have traditionally been tasked with m...
research
02/19/2019

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...
research
02/11/2019

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as ...
research
10/04/2022

Hyperbolic Deep Reinforcement Learning

We propose a new class of deep reinforcement learning (RL) algorithms th...
research
08/17/2022

Choquet regularization for reinforcement learning

We propose Choquet regularizers to measure and manage the level of explo...
research
07/24/2020

Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

Standard reinforcement learning (RL) aims to find an optimal policy that...
research
07/07/2021

Learning Time-Invariant Reward Functions through Model-Based Inverse Reinforcement Learning

Inverse reinforcement learning is a paradigm motivated by the goal of le...

Please sign up or login with your details

Forgot password? Click here to reset