Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

03/24/2022
by   Sebastian Hofstätter, et al.
42

Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality out-of-domain collections, yielding statistically significant gains over traditional retrieval baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) ...
research
07/11/2022

Topic-Grained Text Representation-based Model for Document Retrieval

Document retrieval enables users to find their required documents accura...
research
02/13/2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Recent progress in information retrieval finds that embedding query and ...
research
02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...
research
11/02/2022

Multi-Vector Retrieval as Sparse Alignment

Multi-vector retrieval models improve over single-vector dual encoders o...
research
12/02/2021

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Neural information retrieval (IR) has greatly advanced search and other ...
research
10/03/2021

SDR: Efficient Neural Re-ranking using Succinct Document Representation

BERT based ranking models have achieved superior performance on various ...

Please sign up or login with your details

Forgot password? Click here to reset