DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

03/01/2022
by   Yujia Zhou, et al.
0

Web search provides a promising way for people to obtain information and has been extensively studied. With the surgence of deep learning and large-scale pre-training techniques, various neural information retrieval models are proposed and they have demonstrated the power for improving search (especially, the ranking) quality. All these existing search methods follow a common paradigm, i.e. index-retrieve-rerank, where they first build an index of all documents based on document terms (i.e., sparse inverted index) or representation vectors (i.e., dense vector index), then retrieve and rerank retrieved documents based on similarity between the query and documents via ranking models. In this paper, we explore a new paradigm of information retrieval with neither sparse nor dense index but only a model. Specifically, we propose a pre-training model-based IR system called DynamicRetriever. As for this system, the training stage embeds the token-level and document-level information (especially, document identifiers) of the corpus into the model parameters, then the inference stage directly generates document identifiers for a given query. Compared with existing search methods, the model-based IR system has two advantages: i) it parameterizes the traditional static index with a pre-training model, which converts the document semantic mapping into a dynamic and updatable process; ii) with separate document identifiers, it captures both the term-level and document-level information for each document. Extensive experiments conducted on the public search benchmark MS MARCO verify the effectiveness and potential of our proposed new paradigm for information retrieval.

READ FULL TEXT
research
11/01/2018

DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Most neural Information Retrieval (Neu-IR) models derive query-to-docume...
research
02/10/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

We consider the large-scale query-document retrieval problem: given a qu...
research
05/24/2023

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

Recently, a new paradigm called Differentiable Search Index (DSI) has be...
research
03/12/2022

Information retrieval for label noise document ranking by bag sampling and group-wise loss

Long Document retrieval (DR) has always been a tremendous challenge for ...
research
10/12/2021

Fast Forward Indexes for Efficient Document Ranking

Neural approaches, specifically transformer models, for ranking document...
research
11/21/2019

Separate and Attend in Personal Email Search

In personal email search, user queries often impose different requiremen...
research
11/27/2021

Pre-training Methods in Information Retrieval

The core of information retrieval (IR) is to identify relevant informati...

Please sign up or login with your details

Forgot password? Click here to reset