We explore the impact of parameter sparsity on the scaling behavior of
T...
Sparse mixture of expert architectures (MoEs) scale model capacity witho...
Adversarial robustness is a key desirable property of neural networks. I...
Effective scaling and a flexible task interface enable large language mo...
Large sparsely-activated models have obtained excellent performance in
m...
Transformers are widely applied to solve natural language understanding ...
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...
In the low-data regime, it is difficult to train good supervised models ...
Transfer learning has been recently popularized as a data-efficient
alte...
Transfer of pre-trained representations can improve sample efficiency an...
Uncertainty quantification for deep learning is a challenging open probl...
Recent progress in the field of reinforcement learning has been accelera...
We consider the core reinforcement-learning problem of on-policy value
f...
The estimation of an f-divergence between two probability distributions ...
Recent advances in deep reinforcement learning have made significant str...
We explore the sequential decision making problem where the goal is to
e...
Recommendation systems rely on historical user data to provide suggestio...
We consider the problem of online active learning to collect data for
re...