Rapid advancements of large language models (LLMs) have enabled the
proc...
We propose VADER, a spatio-temporal matching, alignment, and change
summ...
Despite recent advances in Visual QuestionAnswering (VQA), it remains a
...
The task of Visual Commonsense Reasoning is extremely challenging in the...