PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

01/05/2022
by   Sophia Althammer, et al.
0

Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with reciprocal rank fusion (VRRF) weighting, which combines the advantages of rank-based aggregation and topical aggregation based on the dense embeddings. Experimental results show that VRRF outperforms rank-based aggregation strategies for dense document-to-document retrieval with PARM. We compare PARM to document-level retrieval and demonstrate higher retrieval effectiveness of PARM for lexical and dense first-stage retrieval on two different legal case retrieval collections. We investigate how to train the dense retrieval model for PARM on limited target data with labels on the paragraph or the document-level. In addition, we analyze the differences of the retrieved results of lexical and dense retrieval with PARM.

READ FULL TEXT
research
08/09/2021

DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval

In this paper, we present our approaches for the case law retrieval and ...
research
03/15/2022

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

Dense retrieval models, which aim at retrieving the most relevant docume...
research
04/17/2023

Statute-enhanced lexical retrieval of court cases for COLIEE 2022

We discuss our experiments for COLIEE Task 1, a court case retrieval com...
research
07/31/2023

Lexically-Accelerated Dense Retrieval

Retrieval approaches that score documents based on learned dense vectors...
research
12/16/2021

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

We present a framework for improving the performance of a wide class of ...
research
08/01/2023

On the Effects of Regional Spelling Conventions in Retrieval Models

One advantage of neural ranking models is that they are meant to general...
research
04/25/2023

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Neural retrieval models (NRMs) have been shown to outperform their stati...

Please sign up or login with your details

Forgot password? Click here to reset