Sparsely activated transformers, such as Mixture of Experts (MoE), have
...
This report describes Microsoft's machine translation systems for the WM...
The Mixture of Experts (MoE) models are an emerging class of sparsely
ac...
Multilingual Neural Machine Translation (NMT) enables one model to serve...
While pretrained encoders have achieved success in various natural langu...
Multilingual machine translation enables a single model to translate bet...