GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

by   Jiajun Fan, et al.

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify the effectiveness and extensiveness of it. Empirical experiments prove our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98 normalized score (HNS), 1146.39 breakthroughs (HWRB) using only 200M training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.


Generalized Data Distribution Iteration

To obtain higher sample efficiency and superior final performance simult...

The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

Non-stationarity arises in Reinforcement Learning (RL) even in stationar...

Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Deep Reinforcement Learning (RL) methods rely on experience replay to ap...

Entropy Regularized Reinforcement Learning with Cascading Networks

Deep Reinforcement Learning (Deep RL) has had incredible achievements on...

Podracer architectures for scalable Reinforcement Learning

Supporting state-of-the-art AI research requires balancing rapid prototy...

Deep Reinforcement Learning for Turbulence Modeling in Large Eddy Simulations

Over the last years, supervised learning (SL) has established itself as ...

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

Rather than proposing a new method, this paper investigates an issue pre...

Please sign up or login with your details

Forgot password? Click here to reset