Reward Shaping via Meta-Learning

01/27/2019
by   Haosheng Zou, et al.
0

Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment in Reinforcement Learning (RL). However, designing shaping functions usually requires much expert knowledge and hand-engineering, and the difficulties are further exacerbated given multiple similar tasks to solve. In this paper, we consider reward shaping on a distribution of tasks, and propose a general meta-learning framework to automatically learn the efficient reward shaping on newly sampled tasks, assuming only shared state space but not necessarily action space. We first derive the theoretically optimal reward shaping in terms of credit assignment in model-free RL. We then propose a value-based meta-learning algorithm to extract an effective prior over the optimal reward shaping. The prior can be applied directly to new tasks, or provably adapted to the task-posterior while solving the task within few gradient updates. We demonstrate the effectiveness of our shaping through significantly improved learning efficiency and interpretable visualizations across various settings, including notably a successful transfer from DQN to DDPG.

READ FULL TEXT
research
10/02/2020

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Meta-learning is a powerful tool for learning policies that can adapt ef...
research
10/16/2018

ProMP: Proximal Meta-Policy Search

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poor...
research
12/02/2021

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Meta-reinforcement learning (meta-RL) has proven to be a successful fram...
research
09/30/2019

Efficient meta reinforcement learning via meta goal generation

Meta reinforcement learning (meta-RL) is able to accelerate the acquisit...
research
07/15/2021

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Exploration in reinforcement learning is a challenging problem: in the w...
research
02/09/2021

Pairwise Weights for Temporal Credit Assignment

How much credit (or blame) should an action taken in a state get for a f...
research
12/27/2021

Multiagent Model-based Credit Assignment for Continuous Control

Deep reinforcement learning (RL) has recently shown great promise in rob...

Please sign up or login with your details

Forgot password? Click here to reset