Visual question answering (VQA) is a challenging task that requires the
...
Current Vision and Language Models (VLMs) demonstrate strong performance...
While pretraining on large-scale image-text data from the Web has facili...
Large pre-trained models have proved to be remarkable zero- and
(prompt-...
Vision-and-language (V L) models pretrained on large-scale multimodal ...
Advances in Deep Reinforcement Learning have led to agents that perform ...
Modern Visual Question Answering (VQA) models have been shown to rely he...
A number of studies have found that today's Visual Question Answering (V...
Visual Question Answering (VQA) has received a lot of attention over the...
As machines have become more intelligent, there has been a renewed inter...
Recently, a number of deep-learning based models have been proposed for ...
We introduce the first dataset for sequential vision-to-language, and ex...
We present an approach to simultaneously perform semantic segmentation a...
We propose the task of free-form and open-ended Visual Question Answerin...