Masked autoencoders (MAEs) have emerged recently as art self-supervised
...
We addressed the challenging task of video question answering, which req...
In this paper, we introduce Foley Music, a system that can synthesize
pl...
We focus on the task of generating sound from natural videos, and the so...
Recent deep learning approaches have achieved impressive performance on
...