Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

by   Erdem Bıyık, et al.

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, humans provide data in a variety of forms: these include instructions (e.g., natural language), demonstrations, (e.g., kinesthetic guidance), and preferences (e.g., comparative rankings). Prior research has independently applied reward learning to each of these different data sources. However, there exist many domains where some of these information sources are not applicable or inefficient – while multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then proactively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human's ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.


page 5

page 8

page 10

page 11

page 15

page 16

page 17

page 18


Learning Preferences for Interactive Autonomy

When robots enter everyday human environments, they need to understand t...

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

It's challenging to design reward functions for complex, real-world task...

Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. ...

Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries

We focus on learning the desired objective function for a robot. Althoug...

Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach

Human demonstrations can provide trustful samples to train reinforcement...

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

To use reinforcement learning from human feedback (RLHF) in practical ap...

Learning Multimodal Rewards from Rankings

Learning from human feedback has shown to be a useful approach in acquir...

Please sign up or login with your details

Forgot password? Click here to reset