When applied to Image-to-text models, interpretability methods often pro...
Current captioning datasets, focus on object-centric captions, describin...
Image captioning models tend to describe images in an object-centric way...
We propose VALSE (Vision And Language Structured Evaluation), a novel
be...
Images can be described in terms of the objects they contain, or in term...
An ongoing debate in the NLG community concerns the best way to evaluate...
In the last few years, pre-trained neural architectures have provided
im...