Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills

by   Taku Yamagata, et al.

A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback and that way incorporating prior knowledge of the target environment. It is, however, often too expensive to obtain enough feedback of good quality. To mitigate the issue, we aim to rely on a group of multiple experts (and non-experts) with different skill levels to generate enough feedback. Such feedback can therefore be inconsistent and infrequent. In this paper, we build upon prior work – Advise, a Bayesian approach attempting to maximise the information gained from human feedback – extending the algorithm to accept feedback from this larger group of humans, the trainers, while also estimating each trainer's reliability. We show how aggregating feedback from multiple trainers improves the total feedback's accuracy and make the collection process easier in two ways. Firstly, this approach addresses the case of some of the trainers being adversarial. Secondly, having access to the information about each trainer reliability provides a second layer of robustness and offers valuable information for people managing the whole system to improve the overall trust in the system. It offers an actionable tool for improving the feedback collection process or modifying the reward function design if needed. We empirically show that our approach can accurately learn the reliability of each trainer correctly and use it to maximise the information gained from the multiple trainers' feedback, even if some of the sources are adversarial.


page 1

page 2

page 3

page 4


Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds

We describe a method to use discrete human feedback to enhance the perfo...

Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning

This paper considers if a reward function learned via inverse reinforcem...

The growth and form of knowledge networks by kinesthetic curiosity

Throughout life, we might seek a calling, companions, skills, entertainm...

The 360-Degree Feedback Model as a Tool of Total Quality Management

The 360-degree feedback, also known as “multifaceted feedback”, is a man...

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

We present a study on reinforcement learning (RL) from human bandit feed...

Maximizing BCI Human Feedback using Active Learning

Recent advancements in Learning from Human Feedback present an effective...

Multi source feedback based performance appraisal system using Fuzzy logic decision support system

In Multi-Source Feedback or 360 Degree Feedback, data on the performance...

Please sign up or login with your details

Forgot password? Click here to reset