Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

06/23/2020
by   Nelson Vadori, et al.
0

We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the chaotic variation - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Approximate gradient ascent methods for distortion risk measures

We propose approximate gradient ascent algorithms for risk-sensitive rei...
research
06/22/2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decisi...
research
12/05/2015

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimiz...
research
11/04/2021

Model-Free Risk-Sensitive Reinforcement Learning

We extend temporal-difference (TD) learning in order to obtain risk-sens...
research
07/09/2019

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

We present a scheme for sequential decision making with a risk-sensitive...
research
08/19/2022

A Risk-Sensitive Approach to Policy Optimization

Standard deep reinforcement learning (DRL) aims to maximize expected rew...
research
10/22/2019

Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics

Controlling a biped robot to walk stably is a challenging task consideri...

Please sign up or login with your details

Forgot password? Click here to reset