Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment to collect millions of observations. This makes it hard to transfer their success to industrial control problems, where simulations are often very costly or do not exist at all. Furthermore, interacting with (and especially exploring in) the real, physical environment has the potential to lead to catastrophic events. We thus propose a novel model-based RL algorithm, called MOOSE (MOdel-based Offline policy Search with Ensembles) which can train a policy from a pre-existing, fixed dataset. It ensures that dynamics models are able to accurately assess policy performance by constraining the policy to stay within the support of the data. We design MOOSE deliberately similar to state-of-the-art model-free, offline (a.k.a. batch) RL algorithms BEAR and BCQ, with the main difference being that our algorithm is model-based. We compare the algorithms on the Industrial Benchmark and Mujoco continuous control tasks in terms of robust performance and find that MOOSE almost always outperforms its model-free counterparts by far.
READ FULL TEXT 
  
  
     share
 share