Observational Overfitting in Reinforcement Learning

12/06/2019
by   Xingyou Song, et al.
21

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).

READ FULL TEXT

page 3

page 5

page 8

page 9

page 10

page 13

page 20

page 22

research
09/30/2022

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

With the increasing need for handling large state and action spaces, gen...
research
06/07/2021

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

We introduce a mapping between Maximum Entropy Reinforcement Learning (M...
research
09/02/2020

Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics

Poisoning attacks, although have been studied extensively in supervised ...
research
04/16/2021

Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Recent years have seen a rise in interest in terms of using machine lear...
research
10/13/2021

Block Contextual MDPs for Continual Learning

In reinforcement learning (RL), when defining a Markov Decision Process ...
research
09/24/2022

Explainable Reinforcement Learning via Model Transforms

Understanding emerging behaviors of reinforcement learning (RL) agents m...
research
06/20/2017

Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function o...

Please sign up or login with your details

Forgot password? Click here to reset