Mixture of Prompt Experts for Generalizable and Interpretable Question Answering

05/24/2023
by   Chenglei Si, et al.
0

One of the ultimate quests of question answering (QA) is to deploy a system that can answer any type of question from the users, and refrain from answering when it does not know the answer. While recent advancements in scaling large language models (LLMs) brought significant improvements on various QA datasets, it remains difficult for a single model to generalize across question types that require distinct reasoning abilities. In this paper, we first provide empirical evidence that state-of-the-art LLMs such as Codex suffer from poor generalizability on question types beyond those seen in the prompt. To address this, we propose a Mixture-of-Prompt-Experts (MOPE) system that ensembles multiple specialized LLMs. We first implement each specialized model based on the same backbone model (Codex) but with prompts optimized for different reasoning categories including factual, multihop, mathematical, and commonsense reasoning. By strategically selecting the best specialized model for each given question, our MOPE system significantly outperforms any single specialized model on a collection of 12 QA datasets from four reasoning types. Moreover, the attribution and agreement among specialized expert models offer greater interpretability, allowing for better selective question answering. Our human study further confirms that presenting the expert predictions and answer selection process helps annotators more accurately decide when to trust the system's output. We release all code and data to facilitate future work.

READ FULL TEXT
research
07/07/2020

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

Question answering biases in video QA datasets can mislead multimodal mo...
research
06/07/2017

Question Answering and Question Generation as Dual Tasks

We study the problem of joint question answering (QA) and question gener...
research
12/03/2021

MetaQA: Combining Expert Agents for Multi-Skill Question Answering

The recent explosion of question answering (QA) datasets and models has ...
research
04/20/2023

Why Does ChatGPT Fall Short in Answering Questions Faithfully?

Recent advancements in Large Language Models, such as ChatGPT, have demo...
research
04/15/2022

Mixture of Experts for Biomedical Question Answering

Biomedical Question Answering (BQA) has attracted increasing attention i...
research
05/02/2020

UnifiedQA: Crossing Format Boundaries With a Single QA System

Question answering (QA) tasks have been posed using a variety of formats...
research
06/11/2019

HEAD-QA: A Healthcare Dataset for Complex Reasoning

We present HEAD-QA, a multi-choice question answering testbed to encoura...

Please sign up or login with your details

Forgot password? Click here to reset