Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning

07/15/2022
by   Md Masudur Rahman, et al.
0

Deep Reinforcement Learning (RL) agents often overfit the training environment, leading to poor generalization performance. In this paper, we propose Thinker, a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way, and thus, it improves RL agents' generalization. Thinker first clusters experience trajectories into several clusters. These trajectories are then bootstrapped by applying a style transfer generator, which translates the trajectories from one cluster's style to another while maintaining the content of the observations. The bootstrapped trajectories are then used for policy learning. Thinker has wide applicability among many RL settings. Experimental results reveal that Thinker leads to better generalization capability in the Procgen benchmark environments compared to base algorithms and several data augmentation techniques.

READ FULL TEXT

page 9

page 13

research
10/29/2018

Assessing Generalization in Deep Reinforcement Learning

Deep reinforcement learning (RL) has achieved breakthrough results on ma...
research
04/22/2020

AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement Learning

Deep reinforcement learning (RL) algorithms frequently require prohibiti...
research
08/31/2022

Style-Agnostic Reinforcement Learning

We present a novel method of learning style-agnostic representation usin...
research
11/12/2022

Deep Reinforcement Learning with Vector Quantized Encoding

Human decision-making often involves combining similar states into categ...
research
06/01/2022

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

In deep reinforcement learning (RL), data augmentation is widely conside...
research
04/26/2023

Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

In this paper, we define, evaluate, and improve the “relay-generalizatio...
research
06/04/2023

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

Reinforcement learning agents may sometimes develop habits that are effe...

Please sign up or login with your details

Forgot password? Click here to reset