Evaluating Open Question Answering Evaluation

by   Cunxiang Wang, et al.

This study focuses on the evaluation of Open Question Answering (Open-QA) tasks, which have become vital in the realm of artificial intelligence. Current automatic evaluation methods have shown limitations, indicating that human evaluation still remains the most reliable approach. We introduce a new task, QA Evaluation (QA-Eval), designed to assess the accuracy of AI-generated answers in relation to standard answers within Open-QA. Our evaluation of these methods utilizes human-annotated results, and we employ accuracy and F1 score to measure their performance. Specifically, the work investigates methods that show high correlation with human evaluations, deeming them more reliable. We also discuss the pitfalls of current methods, such as their inability to accurately judge responses that contain excessive information. The dataset generated from this work is expected to facilitate the development of more effective automatic evaluation tools. We believe this new QA-Eval task and corresponding dataset will prove valuable for future research in this area.


page 1

page 2

page 3

page 4


SF-QA: Simple and Fair Evaluation Library for Open-domain Question Answering

Although open-domain question answering (QA) draws great attention in re...

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

While question answering (QA) with neural network, i.e. neural QA, has a...

SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

Evaluation of QA systems is very challenging and expensive, with the mos...

Evaluation of AI Chatbots for Patient-Specific EHR Questions

This paper investigates the use of artificial intelligence chatbots for ...

Building Extractive Question Answering System to Support Human-AI Health Coaching Model for Sleep Domain

Non-communicable diseases (NCDs) are a leading cause of global deaths, n...

Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

The ability of intelligent agents to play games in human-like fashion is...

Please sign up or login with your details

Forgot password? Click here to reset