Entropic Risk Measure in Policy Search

06/21/2019
by   David Nass, et al.
0

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

READ FULL TEXT

page 1

page 5

research
05/13/2019

Learning Novel Policies For Tasks

In this work, we present a reinforcement learning algorithm that can fin...
research
09/28/2021

Risk averse non-stationary multi-armed bandits

This paper tackles the risk averse multi-armed bandits problem when incu...
research
06/03/2011

Infinite-Horizon Policy-Gradient Estimation

Gradient-based approaches to direct policy search in reinforcement learn...
research
12/06/2019

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

In real-world decision-making problems, for instance in the fields of fi...
research
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
research
08/18/2021

A good body is all you need: avoiding catastrophic interference via agent architecture search

In robotics, catastrophic interference continues to restrain policy trai...
research
02/17/2018

Learning to Race through Coordinate Descent Bayesian Optimisation

In the automation of many kinds of processes, the observable outcome can...

Please sign up or login with your details

Forgot password? Click here to reset