Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

by   Arian Askari, et al.

Retrieval with extremely long queries and documents is a well-known and challenging task in information retrieval and is commonly known as Query-by-Document (QBD) retrieval. Specifically designed Transformer models that can handle long input sequences have not shown high effectiveness in QBD tasks in previous work. We propose a Re-Ranker based on the novel Proportional Relevance Score (RPRS) to compute the relevance score between a query and the top-k candidate documents. Our extensive evaluation shows RPRS obtains significantly better results than the state-of-the-art models on five different datasets. Furthermore, RPRS is highly efficient since all documents can be pre-processed, embedded, and indexed before query time which gives our re-ranker the advantage of having a complexity of O(N) where N is the total number of sentences in the query and candidate documents. Furthermore, our method solves the problem of the low-resource training in QBD retrieval tasks as it does not need large amounts of training data, and has only three parameters with a limited range that can be optimized with a grid search even if a small amount of labeled data is available. Our detailed analysis shows that RPRS benefits from covering the full length of candidate documents and queries.


page 1

page 2

page 3

page 4


Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Dense retrieval systems conduct first-stage retrieval using embedded rep...

On the Interpolation of Contextualized Term-based Ranking with BM25 for Query-by-Example Retrieval

Term-based ranking with pre-trained transformer-based language models ha...

Pre-training Tasks for Embedding-based Large-scale Retrieval

We consider the large-scale query-document retrieval problem: given a qu...

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

We present a framework for improving the performance of a wide class of ...

Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments

Linear combination is a potent data fusion method in information retriev...

Efficient Deterministic Quantitative Group Testing for Precise Information Retrieval

The Quantitative Group Testing (QGT) is about learning a (hidden) subset...

XWalk: Random Walk Based Candidate Retrieval for Product Search

In e-commerce, head queries account for the vast majority of gross merch...

Please sign up or login with your details

Forgot password? Click here to reset