Gradient-free Policy Architecture Search and Adaptation

10/16/2017
by   Sayna Ebrahimi, et al.
0

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.

READ FULL TEXT
research
08/13/2020

Network Architecture Search for Domain Adaptation

Deep networks have been used to learn transferable representations for d...
research
02/14/2018

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and i...
research
11/16/2021

GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

Deep reinforcement learning (DRL) has been demonstrated to be effective ...
research
04/18/2019

Improving Interactive Reinforcement Agent Planning with Human Demonstration

TAMER has proven to be a powerful interactive reinforcement learning met...
research
10/13/2021

Safe Driving via Expert Guided Policy Optimization

When learning common skills like driving, beginners usually have domain ...
research
06/15/2023

Behavioral Cloning via Search in Embedded Demonstration Dataset

Behavioural cloning uses a dataset of demonstrations to learn a behaviou...

Please sign up or login with your details

Forgot password? Click here to reset