Generating Scientific Question Answering Corpora from Q A forums

by   Andre Lamurias, et al.

Question Answering (QA) is a natural language processing task that aims at retrieving relevant answers to user questions. While much progress has been made in this area, biomedical questions are still a challenge to most QA approaches, due to the complexity of the domain and limited availability of training sets. We present a method to automatically extract question-article pairs from Q A web forums, which can be used for document retrieval and QA tasks. The proposed framework extracts questions from selected forums as well as answers that contain citations that can be mapped to a unique entry of a digital library. This way, QA systems based on document retrieval can be developed and evaluated using the question-article pairs annotated by users of these forums. We generated the SciQA corpus by applying our framework to three forums, obtaining 5,432 questions and 10,208 question-article pairs. We evaluated how the number of articles associated with each question and the number of votes on each answer affects the performance of baseline document retrieval approaches. Also, we trained a state-of-the-art deep learning model that obtained higher scores in most test batches than a model trained only on a dataset manually annotated by experts. The framework described in this paper can be used to update the SciQA corpus from the same forums as new posts are made, and from other forums that support their answers with documents.


page 1

page 2

page 3

page 4


Top K Relevant Passage Retrieval for Biomedical Question Answering

Question answering is a task that answers factoid questions using a larg...

A Semi-supervised learning approach to enhance health care Community-based Question Answering: A case study in alcoholism

Community-based Question Answering (CQA) sites play an important role in...

V-Doc : Visual questions answers with Documents

We propose V-Doc, a question-answering tool using document images and PD...

Contributions to the Improvement of Question Answering Systems in the Biomedical Domain

This thesis work falls within the framework of question answering (QA) i...

When to Read Documents or QA History: On Unified and Selective Open-domain QA

This paper studies the problem of open-domain question answering, with t...

Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

Question Answering (QA) tasks requiring information from multiple docume...

Simple and Effective Semi-Supervised Question Answering

Recent success of deep learning models for the task of extractive Questi...

Please sign up or login with your details

Forgot password? Click here to reset