How can one visually characterize people in a decade? In this work, we
a...
The Visual Question Answering (VQA) task aspires to provide a meaningful...
Contrastive language image pretraining (CLIP) encoders have been shown t...
We present a task and benchmark dataset for person-centric visual ground...
Important ethical concerns arising from computer vision datasets of peop...