Learning Preferences for Interactive Autonomy

by   Erdem Bıyık, et al.

When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A promising approach is to learn reward functions from humans. Recently, several robot learning works embrace this approach and leverage human demonstrations to learn the reward functions. Known as inverse reinforcement learning, this approach relies on a fundamental assumption: humans can provide near-optimal demonstrations to the robot. Unfortunately, this is rarely the case: human demonstrations to the robot are often suboptimal due to various reasons, e.g., difficulty of teleoperation, robot having high degrees of freedom, or humans' cognitive limitations. This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities. Specifically, we study how reward functions can be learned using comparative feedback, in which the human user compares multiple robot trajectories instead of (or in addition to) providing demonstrations. To this end, we first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric. Next, we propose active learning techniques to enable the robot to ask for comparison feedback that optimizes for the expected information that will be gained from that user feedback. Finally, we demonstrate the applicability of our methods in a wide variety of domains, ranging from autonomous driving simulations to home robotics, from standard reinforcement learning benchmarks to lower-body exoskeletons.


Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. ...

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

Reward functions are a common way to specify the objective of a robot. A...

Maximizing BCI Human Feedback using Active Learning

Recent advancements in Learning from Human Feedback present an effective...

Choice Set Misspecification in Reward Inference

Specifying reward functions for robots that operate in environments with...

Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation

Socially aware robot navigation, where a robot is required to optimize i...

Reward Learning with Intractable Normalizing Functions

Robots can learn to imitate humans by inferring what the human is optimi...

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

One of the most successful paradigms for reward learning uses human feed...

Please sign up or login with your details

Forgot password? Click here to reset