An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

07/17/2023
by   Yudong Luo, et al.
0

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

READ FULL TEXT

page 3

page 18

research
04/22/2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning

We present a new per-step reward perspective for risk-averse control in ...
research
06/27/2012

Policy Gradients with Variance Related Risk Criteria

Managing risk in dynamic decision problems is of cardinal importance in ...
research
12/06/2019

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

In real-world decision-making problems, for instance in the fields of fi...
research
10/03/2020

Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning

In real-world decision-making problems, risk management is critical. Amo...
research
06/15/2022

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Keeping risk under control is often more crucial than maximizing expecte...
research
05/10/2022

Efficient Risk-Averse Reinforcement Learning

In risk-averse reinforcement learning (RL), the goal is to optimize some...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...

Please sign up or login with your details

Forgot password? Click here to reset