We identify incremental learning dynamics in transformers, where the
dif...
Transformers have gained increasing popularity in a wide range of
applic...
The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...
Learning generalizeable policies from visual input in the presence of vi...
Offline Reinforcement Learning promises to learn effective policies from...
Modern neural network performance typically improves as model size incre...