This work investigates the nuanced algorithm design choices for deep lea...
There is mounting empirical evidence of emergent phenomena in the
capabi...
Self-attention, an architectural motif designed to model long-range
inte...
We initiate the study of the natural multiplayer generalization of the
c...
We perform an experimental study of the dynamics of Stochastic Gradient
...