In this work, we develop and release Llama 2, a collection of pretrained...
Speech-to-speech translation (S2ST) enables spoken communication between...
Machine Translation (MT) has been widely used for cross-lingual
classifi...
Mixture-of-experts (MoE) models that employ sparse activation have
demon...
We introduce MuAViC, a multilingual audio-visual corpus for robust speec...
Multilingual machine translation (MMT) benefits from cross-lingual trans...
Multilingual machine translation models can benefit from synergy between...
Driven by the goal of eradicating language barriers on a global scale,
m...
State-of-the-art vision and vision-and-language models rely on large-sca...
Multi-task learning with an unbalanced data distribution skews model lea...
Performance on the most commonly used Visual Question Answering dataset ...
Sketching or doodling is a popular creative activity that people engage ...
This work proposes a new challenge set for multimodal classification,
fo...
This paper targets at visual counting, where the setup is to estimate th...
Numerous recent works have proposed pretraining generic visio-linguistic...
Much of vision-and-language research focuses on a small but diverse set ...
Understanding temporal information and how the visual world changes over...