Given a long untrimmed video and natural language queries, video groundi...
Understanding how events described or shown in multimedia content relate...
Visual entailment is a recently proposed multimodal reasoning task where...
Visual and textual modalities contribute complementary information about...
We study the problem of animating images by transferring spatio-temporal...
The abundance of multimodal data (e.g. social media posts) has inspired
...
The news media shape public opinion, and often, the visual bias they con...
Computer vision systems currently lack the ability to reliably recognize...
In this paper, we examine the visual variability of objects across diffe...
There is more to images than their objective physical content: for examp...
In this technical report, we present our publicly downloadable implement...