Experience Filter: Using Past Experiences on Unseen Tasks or Environments

by   Anil Yildiz, et al.

One of the bottlenecks of training autonomous vehicle (AV) agents is the variability of training environments. Since learning optimal policies for unseen environments is often very costly and requires substantial data collection, it becomes computationally intractable to train the agent on every possible environment or task the AV may encounter. This paper introduces a zero-shot filtering approach to interpolate learned policies of past experiences to generalize to unseen ones. We use an experience kernel to correlate environments. These correlations are then exploited to produce policies for new tasks or environments from learned policies. We demonstrate our methods on an autonomous vehicle driving through T-intersections with different characteristics, where its behavior is modeled as a partially observable Markov decision process (POMDP). We first construct compact representations of learned policies for POMDPs with unknown transition functions given a dataset of sequential actions and observations. Then, we filter parameterized policies of previously visited environments to generate policies to new, unseen environments. We demonstrate our approaches on both an actual AV and a high-fidelity simulator. Results indicate that our experience filter offers a fast, low-effort, and near-optimal solution to create policies for tasks or environments never seen before. Furthermore, the generated new policies outperform the policy learned using the entire data collected from past environments, suggesting that the correlation among different environments can be exploited and irrelevant ones can be filtered out.


page 1

page 5


Instance based Generalization in Reinforcement Learning

Agents trained via deep reinforcement learning (RL) routinely fail to ge...

Frequency-Based Patrolling with Heterogeneous Agents and Limited Communication

This paper investigates multi-agent frequencybased patrolling of interse...

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Humans are experts at high-fidelity imitation -- closely mimicking a dem...

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Deep Reinforcement Learning (DRL) has been a promising solution to many ...

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

ML-based motion planning is a promising approach to produce agents that ...

Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies

Advancements in reinforcement learning (RL) have inspired new directions...

An Economy of Neural Networks: Learning from Heterogeneous Experiences

This paper proposes a new way to model behavioral agents in dynamic macr...

Please sign up or login with your details

Forgot password? Click here to reset