ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

01/16/2019
by   Jacob Beck, et al.
14

In autonomous vehicle (AV) control, allowing mistakes can be quite dangerous and costly in the real world. For this reason we investigate methods of training an AV without allowing the agent to explore and instead having a human explorer collect the data. Supervised learning has been explored for AV control, but it encounters the issue of the covariate shift. That is, training data collected from an optimal demonstration consists only of the states induced by the optimal control policy, but at runtime, the trained agent may encounter a vastly different state distribution with little relevant training data. To mitigate this issue, we have our human explorer make sub-optimal decisions. In order to have our agent not replicate these sub-optimal decisions, supervised learning requires that we either erase these actions, or replace these action with the correct action. Erasing is wasteful and replacing is difficult, since it is not easy to know the correct action without driving. We propose an alternate framework that includes continuous scalar feedback for each action, marking which actions we should replicate, which we should avoid, and how sure we are. Our framework learns continuous control from sub-optimal demonstration and evaluative feedback collected before training. We find that a human demonstrator can explore sub-optimal states in a safe manner, while still getting enough gradation to benefit learning. The collection method for data and feedback we call "Backseat Driver." We call the more general learning framework ReNeg, since it learns a regression from states to actions given negative as well as positive examples. We empirically validate several models in the ReNeg framework, testing on lane-following with limited data. We find that the best solution is a generalization of mean-squared error and outperforms supervised learning on the positive examples alone.

READ FULL TEXT
research
03/16/2020

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Deep reinforcement learning can learn effective policies for a wide rang...
research
06/07/2020

Learning Behaviors with Uncertain Human Feedback

Human feedback is widely used to train agents in many domains. However, ...
research
04/18/2019

Improving Interactive Reinforcement Agent Planning with Human Demonstration

TAMER has proven to be a powerful interactive reinforcement learning met...
research
12/05/2018

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Many reality tasks such as robot coordination can be naturally modelled ...
research
10/14/2018

Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Doodling is a useful and common intelligent skill that people can learn ...
research
05/23/2022

Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies

The current work on reinforcement learning (RL) from demonstrations ofte...
research
08/05/2020

Learning Power Control from a Fixed Batch of Data

We address how to exploit power control data, gathered from a monitored ...

Please sign up or login with your details

Forgot password? Click here to reset