EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

by   Aravind Rajeswaran, et al.
Indian Institute Of Technology, Madras
berkeley college
University of Washington

Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, learning on a model ensemble, along with source domain adaptation, provides the benefit of both robustness and learning/adaptation.


page 6

page 12

page 13


Learning causal representations for robust domain adaptation

Domain adaptation solves the learning problem in a target domain by leve...

MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds

Unsupervised domain adaptation (UDA) addresses the problem of distributi...

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

As learning-based approaches progress towards automating robot controlle...

Learning Sampling Policies for Domain Adaptation

We address the problem of semi-supervised domain adaptation of classific...

Domain Adaptation Through Task Distillation

Deep networks devour millions of precisely annotated images to build the...

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

We propose a simple, practical, and intuitive approach for domain adapta...

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Generalizing policies across different domains with dynamics mismatch po...

Code Repositories



view repo

Please sign up or login with your details

Forgot password? Click here to reset