LeiBi@COLIEE 2022: Aggregating Tuned Lexical Models with a Cluster-driven BERT-based Model for Case Law Retrieval

by   Arian Askari, et al.

This paper summarizes our approaches submitted to the case law retrieval task in the Competition on Legal Information Extraction/Entailment (COLIEE) 2022. Our methodology consists of four steps; in detail, given a legal case as a query, we reformulate it by extracting various meaningful sentences or n-grams. Then, we utilize the pre-processed query case to retrieve an initial set of possible relevant legal cases, which we further re-rank. Lastly, we aggregate the relevance scores obtained by the first stage and the re-ranking models to improve retrieval effectiveness. In each step of our methodology, we explore various well-known and novel methods. In particular, to reformulate the query cases aiming to make them shorter, we extract unigrams using three different statistical methods: KLI, PLM, IDF-r, as well as models that leverage embeddings (e.g., KeyBERT). Moreover, we investigate if automatic summarization using Longformer-Encoder-Decoder (LED) can produce an effective query representation for this retrieval task. Furthermore, we propose a novel re-ranking cluster-driven approach, which leverages Sentence-BERT models that are pre-tuned on large amounts of data for embedding sentences from query and candidate documents. Finally, we employ a linear aggregation method to combine the relevance scores obtained by traditional IR models and neural-based models, aiming to incorporate the semantic understanding of neural models and the statistically measured topical relevance. We show that aggregating these relevance scores can improve the overall retrieval effectiveness.


page 1

page 2

page 3

page 4


DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval

In this paper, we present our approaches for the case law retrieval and ...

nigam@COLIEE-22: Legal Case Retrieval and Entailment using Cascading of Lexical and Semantic-based models

This paper describes our submission to the Competition on Legal Informat...

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

The task of Prior Case Retrieval (PCR) in the legal domain is about auto...

Enhancing Documents with Multidimensional Relevance Statements in Cross-encoder Re-ranking

In this paper, we propose a novel approach to consider multiple dimensio...

UNIMIB at TREC 2021 Clinical Trials Track

This contribution summarizes the participation of the UNIMIB team to the...

Unsupervised Identification of Relevant Prior Cases

Document retrieval has taken its role in almost all domains of knowledge...

Investigating Retrieval Method Selection with Axiomatic Features

We consider algorithm selection in the context of ad-hoc information ret...

Please sign up or login with your details

Forgot password? Click here to reset