Simple Entity-Centric Questions Challenge Dense Retrievers

09/17/2021
by   Christopher Sciavolino, et al.
0

Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, in this paper, we demonstrate current dense models are not yet the holy grail of retrieval. We first construct EntityQuestions, a set of simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born?"), and observe that dense retrievers drastically underperform sparse methods. We investigate this issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training. We discuss two simple solutions towards addressing this critical problem. First, we demonstrate that data augmentation is unable to fix the generalization problem. Second, we argue a more robust passage encoder helps facilitate better question adaptation using specialized question encoders. We hope our work can shed light on the challenges in creating a robust, universal dense retriever that works well across different input distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Towards Universal Dense Retrieval for Open-domain Question Answering

In open-domain question answering, a model receives a text question as i...
research
10/11/2022

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

Given its effectiveness on knowledge-intensive natural language processi...
research
10/04/2021

Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

One key feature of dense passage retrievers (DPR) is the use of separate...
research
03/09/2023

Can a Frozen Pretrained Language Model be used for Zero-shot Neural Retrieval on Entity-centric Questions?

Neural document retrievers, including dense passage retrieval (DPR), hav...
research
09/02/2021

Challenges in Generalization in Open Domain Question Answering

Recent work on Open Domain Question Answering has shown that there is a ...
research
12/20/2022

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Dual encoders are now the dominant architecture for dense retrieval. Yet...
research
06/13/2023

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

BEIR is a benchmark dataset for zero-shot evaluation of information retr...

Please sign up or login with your details

Forgot password? Click here to reset