Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics

05/21/2022
by   Elisa Kreiss, et al.
0

Few images on the Web receive alt-text descriptions that would make them accessible to blind and low vision (BLV) users. Image-based NLG systems have progressed to the point where they can begin to address this persistent societal problem, but these systems will not be fully successful unless we evaluate them on metrics that guide their development correctly. Here, we argue against current referenceless metrics – those that don't rely on human-generated ground-truth descriptions – on the grounds that they do not align with the needs of BLV users. The fundamental shortcoming of these metrics is that they cannot take context into account, whereas contextual information is highly valued by BLV users. To substantiate these claims, we present a study with BLV participants who rated descriptions along a variety of dimensions. An in-depth analysis reveals that the lack of context-awareness makes current referenceless metrics inadequate for advancing image accessibility, requiring a rethinking of referenceless evaluation metrics for image-based NLG systems.

READ FULL TEXT
research
07/22/2019

VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions

We address the task of evaluating image description generation systems. ...
research
09/21/2023

ContextRef: Evaluating Referenceless Metrics For Image Description Generation

Referenceless metrics (e.g., CLIPScore) use pretrained vision–language m...
research
02/01/2021

Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Online shopping has become a valuable modern convenience, but blind or l...
research
05/24/2022

Face2Text revisited: Improved data set and baseline results

Current image description generation models do not transfer well to the ...
research
07/20/2020

Collecting Service-Based Maintainability Metrics from RESTful API Descriptions: Static Analysis and Threshold Derivation

While many maintainability metrics have been explicitly designed for ser...
research
11/02/2022

Dialect-robust Evaluation of Generated Text

Evaluation metrics that are not robust to dialect variation make it impo...
research
12/02/2020

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

In this paper, we propose an end-to-end CNN-LSTM model for generating de...

Please sign up or login with your details

Forgot password? Click here to reset