An Empirical Study of Implicit Regularization in Deep Offline RL

07/05/2022
by   Caglar Gulcehre, et al.
12

Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called effective rank, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading.

READ FULL TEXT

page 1

page 8

page 12

page 31

page 32

page 33

page 35

page 37

research
10/27/2020

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

We identify an implicit under-parameterization phenomenon in value-based...
research
12/09/2021

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Despite overparameterization, deep networks trained via supervised learn...
research
10/19/2021

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real...
research
10/13/2022

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

We consider a hybrid reinforcement learning setting (Hybrid RL), in whic...
research
07/24/2023

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

As with any machine learning problem with limited data, effective offlin...

Please sign up or login with your details

Forgot password? Click here to reset