The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

by   Nishil Patel, et al.

Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.


page 10

page 24


TTR-Based Rewards for Reinforcement Learning with Implicit Model Priors

Model-free reinforcement learning (RL) provides an attractive approach f...

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Reinforcement learning (RL) has recently proven great success in various...

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

The main challenge in developing effective reinforcement learning (RL) p...

Contrastive Value Learning: Implicit Models for Simple Offline RL

Model-based reinforcement learning (RL) methods are appealing in the off...

Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

Data-efficient reinforcement learning (RL) in continuous state-action sp...

Playing 20 Question Game with Policy-Based Reinforcement Learning

The 20 Questions (Q20) game is a well known game which encourages deduct...

Causal Discovery with Reinforcement Learning

Discovering causal structure among a set of variables is a fundamental p...

Please sign up or login with your details

Forgot password? Click here to reset