Identifying Reward Functions using Anchor Actions

07/15/2020
by   Sinong Geng, et al.
0

We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. We name our method PQR, as it sequentially estimates the Policy, the Q-function, and the Reward function. PQR does not assume that the reward solely depends on the state, instead it allows for a dependency on the choice of action. Moreover, PQR allows for stochastic state transitions. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the PQR reward estimator uniquely recovers the true reward. With unknown transitions, we bound the estimation error of PQR. Finally, the performance of PQR is demonstrated by synthetic and real-world datasets.

READ FULL TEXT

page 8

page 27

research
05/24/2023

Inverse Reinforcement Learning with the Average Reward Criterion

We study the problem of Inverse Reinforcement Learning (IRL) with an ave...
research
04/16/2018

Distribution Estimation in Discounted MDPs via a Transformation

Although the general deterministic reward function in MDPs takes three a...
research
01/25/2022

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...
research
04/12/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

In stochastic contextual bandit (SCB) problems, an agent selects an acti...
research
12/22/2018

Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

In structured output prediction tasks, labeling ground-truth training ou...
research
09/27/2022

Defining and Characterizing Reward Hacking

We provide the first formal definition of reward hacking, a phenomenon w...
research
10/16/2018

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

A general-purpose intelligent robot must be able to learn autonomously a...

Please sign up or login with your details

Forgot password? Click here to reset