Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

05/20/2021
by   Sebastian Hofstätter, et al.
0

An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the document with BERT. To make matters worse, this high inference cost and latency varies based on the length of the document, with longer documents requiring more time and computation. To address this challenge, we adopt an intra-document cascading strategy, which prunes passages of a candidate document using a less expensive model, called ESM, before running a scoring model that is more expensive and effective, called ETM. We found it best to train ESM (short for Efficient Student Model) via knowledge distillation from the ETM (short for Effective Teacher Model) e.g., BERT. This pruning allows us to only run the ETM model on a smaller set of passages whose size does not vary by document length. Our experiments on the MS MARCO and TREC Deep Learning Track benchmarks suggest that the proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400 providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.

READ FULL TEXT
research
09/16/2020

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Despite the effectiveness of utilizing BERT for document ranking, the co...
research
02/08/2023

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Although BERT-based ranking models have been commonly used in commercial...
research
10/31/2019

Multi-Stage Document Ranking with BERT

The advent of deep neural networks pre-trained via language modeling tas...
research
07/27/2023

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

Deep learning often faces the challenge of efficiently processing dynami...
research
10/06/2020

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

The latency of neural ranking models at query time is largely dependent ...
research
06/23/2021

Learnt Sparsity for Effective and Interpretable Document Ranking

Machine learning models for the ad-hoc retrieval of documents and passag...

Please sign up or login with your details

Forgot password? Click here to reset