Referenceless metrics (e.g., CLIPScore) use pretrained vision–language
m...
Visual question answering (VQA) has the potential to make the Internet m...
We make a first attempt to characterize image accessibility on Wikipedia...
Few images on the Web receive alt-text descriptions that would make them...
Speakers' referential expressions often depart from communicative ideals...
Distillation efforts have led to language models that are more compact a...
In many areas, we have well-founded insights about causal structure that...
The ability to compositionally map language to referents, relations, and...
Images have become an integral part of online media. This has enhanced
s...
Crime reporting is a prevalent form of journalism with the power to shap...
Referring is one of the most basic and prevalent uses of language. How d...