Recent empirical evidence indicates that transformer based in-context
le...
We study how vision-language models trained on Internet-scale data can b...
Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ...
The possibility of super-AIs taking over the world has been intensively
...
The problem of knowledge-based visual question answering involves answer...
Visual question answering requires a deep understanding of both images a...
Most recent state-of-the-art Visual Question Answering (VQA) systems are...
Most RNN-based image captioning models receive supervision on the output...
Visual question answering (VQA) and image captioning require a shared bo...
Visual Question Answering (VQA) deep-learning systems tend to capture
su...
There has long been debates on how we could interpret neural networks an...
AI systems' ability to explain their reasoning is critical to their util...
Answering visual questions need acquire daily common knowledge and model...
We present Dynamic Sampling Convolutional Neural Networks (DSCNN), where...
We propose a novel deep supervised neural network for the task of action...