Policy Gradients for CVaR-Constrained MDPs

05/12/2014
by   Prashanth L. A., et al.
0

We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along the lines of Bardou et al. [2009], which in turn is based on Rockafellar-Uryasev's representation for CVaR and utilize the likelihood ratio principle for estimating the gradient of the sum of one cost function (objective of the SSP) and the gradient of the CVaR of the sum of another cost function (in the constraint of SSP). The algorithms differ in the manner in which they approximate the CVaR estimates/necessary gradients - the first algorithm uses stochastic approximation, while the second employ mini-batches in the spirit of Monte Carlo methods. We establish asymptotic convergence of both the algorithms. Further, since estimating CVaR is related to rare-event simulation, we incorporate an importance sampling based variance reduction scheme into our proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2022

Combining Retrospective Approximation with Importance Sampling for Optimising Conditional Value at Risk

This paper investigates the use of retrospective approximation solution ...
research
02/22/2022

Approximate gradient ascent methods for distortion risk measures

We propose approximate gradient ascent algorithms for risk-sensitive rei...
research
07/24/2019

On importance-weighted autoencoders

The importance weighted autoencoder (IWAE) (Burda et al., 2016) is a pop...
research
02/02/2019

Stochastic Enumeration with Importance Sampling

Many hard problems in the computational sciences are equivalent to count...
research
07/21/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Annealed importance sampling (AIS) and related algorithms are highly eff...
research
06/12/2014

Algorithms for CVaR Optimization in MDPs

In many sequential decision-making problems we may want to manage risk b...
research
03/01/2023

Forward-PECVaR Algorithm: Exact Evaluation for CVaR SSPs

The Stochastic Shortest Path (SSP) problem models probabilistic sequenti...

Please sign up or login with your details

Forgot password? Click here to reset